cs.AI, cs.CL, cs.LG

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models

arXiv:2604.26951v1 Announce Type: new
Abstract: Diffusion large language models (dLLMs) offer parallel decoding and bidirectional context, but state-of-the-art dLLMs require billions of parameters for competitive performance. While existing distillati…