Extending Puzzle for Mixture-of-Experts Reasoning Models with Application to GPT-OSS Acceleration
arXiv:2602.11937v2 Announce Type: replace
Abstract: Reasoning-focused LLMs improve answer quality by generating longer reasoning traces, but the additional tokens dramatically increase serving cost, motivating inference optimization. We extend and app…