cs.AI, cs.LG

Segment-Aligned Policy Optimization for Multi-Modal Reasoning

arXiv:2605.01327v1 Announce Type: new
Abstract: Existing reinforcement learning approaches for Large Language Models typically perform policy optimization at the granularity of individual tokens or entire response sequences. However, such formulations…