Muheng Li, Jian Qian, Wenlong Mou

What should post-training optimize? A test-time scaling law perspective

Muheng Li, Jian Qian, Wenlong Mou / May 12, 2026

arXiv:2605.10716v1 Announce Type: cross
Abstract: Large language models are increasingly deployed with test-time strategies: sample $N$ responses, score them with a reward model or verifier, and return the best. This deployment rule exposes a mismatch…

Author name: Muheng Li, Jian Qian, Wenlong Mou

What should post-training optimize? A test-time scaling law perspective