Where Should Diffusion Enter a Language Model? Geometry-Guided Hidden-State Replacement
arXiv:2605.14368v1 Announce Type: cross
Abstract: Continuous diffusion language models lag behind autoregressive transformers, partly because diffusion is applied in spaces poorly suited to language denoising and token recovery. We propose DiHAL, a ge…