Entropy Centroids as Intrinsic Rewards for Test-Time Scaling
arXiv:2604.26173v1 Announce Type: cross
Abstract: An effective way to scale up test-time compute of large language models is to sample multiple responses and then select the best one, as in Grok Heavy and Gemini Deep Think. Existing selection methods …