LEPO: Latent Reasoning Policy Optimization for Large Language Models
arXiv:2604.17892v2 Announce Type: replace
Abstract: Recently, latent reasoning has been introduced into large language models (LLMs) to leverage rich information within a continuous space. However, without stochastic sampling, these methods inevitably…