Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models
arXiv:2604.26951v1 Announce Type: new
Abstract: Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillati…