cs.CL, cs.LG

Multi-Token Prediction via Self-Distillation

arXiv:2602.06019v2 Announce Type: replace
Abstract: Existing techniques for accelerating language model inference, such as speculative decoding, require training auxiliary speculator models and building and deploying complex inference pipelines. We co…

Scroll to Top