Wenshuo Zhao, Qi Zhu, Xingshan Zeng, Fei Mi, Lifeng Shang, Yiren Feng

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling

Wenshuo Zhao, Qi Zhu, Xingshan Zeng, Fei Mi, Lifeng Shang, Yiren Feng / April 30, 2026

arXiv:2604.26173v1 Announce Type: cross
Abstract: An effective way to scale up test-time compute of large language models is to sample multiple responses and then select the best one, as in Grok Heavy and Gemini Deep Think. Existing selection methods …

Author name: Wenshuo Zhao, Qi Zhu, Xingshan Zeng, Fei Mi, Lifeng Shang, Yiren Feng

Entropy Centroids as Intrinsic Rewards for Test-Time Scaling