cs.CL, cs.LG

Latent-GRPO: Group Relative Policy Optimization for Latent Reasoning

arXiv:2604.27998v1 Announce Type: new
Abstract: Latent reasoning offers a more efficient alternative to explicit reasoning by compressing intermediate reasoning into continuous representations and substantially shortening reasoning chains. However, ex…