cs.AI, cs.AR, cs.DC

NPU Design for Diffusion Language Model Inference

arXiv:2601.20706v2 Announce Type: replace-cross
Abstract: Diffusion-based LLMs (dLLMs) fundamentally depart from traditional autoregressive (AR) LLM inference: they leverage bidirectional attention, block-wise KV cache refreshing, cross-step reuse, an…

Scroll to Top