LocalLLaMA

[Tool] Quick hack to recover Qwen3.5 MTP after fine-tuning for faster inference speed (Transformers)

Disclaimer: I work at NuMind (we train LLMs for structured + content extraction). If you've been working with Qwen3.5 (and other recently released models), you probably know it includes Multi-Token Prediction (MTP) modules. When used with vLLM (qwe…