Reaching Beyond the Mode: RL for Distributional Reasoning in Language Models
arXiv:2603.24844v1 Announce Type: cross
Abstract: Given a question, a language model (LM) implicitly encodes a distribution over possible answers. In practice, post-training procedures for LMs often collapse this distribution onto a single dominant mo…