cs.AI, cs.CL

When Perplexity Lies: Generation-Focused Distillation of Hybrid Sequence Models

arXiv:2603.26556v1 Announce Type: new
Abstract: Converting a pretrained Transformer into a more efficient hybrid model through distillation offers a promising approach to reducing inference costs. However, achieving high-quality generation in distille…