cs.AI, cs.LG

On the Trainability of Masked Diffusion Language Models via Blockwise Locality

arXiv:2604.24832v1 Announce Type: new
Abstract: Masked diffusion language models (MDMs) have recently emerged as a promising alternative to standard autoregressive large language models (AR-LLMs), yet their optimization can be substantially less stabl…