Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning
arXiv:2604.27998v1 Announce Type: new
Abstract: Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and substantially shortening reasoning chains. However, ex…