A Causal Language Modeling Detour Improves Encoder Continued Pretraining
arXiv:2605.12438v1 Announce Type: new
Abstract: When adapting an encoder to a new domain, the standard approach is to continue training with Masked Language Modeling (MLM). We show that temporarily switching to Causal Language Modeling (CLM) followed …