A Comparative analysis of Layer-wise Representational Capacity in AR and Diffusion LLMs
arXiv:2603.07475v2 Announce Type: replace
Abstract: Autoregressive (AR) language models build representations incrementally via left-to-right prediction, while diffusion language models (dLLMs) are trained through full-sequence denoising. Although rec…