cs.CL, cs.LG

Skip-Connected Policy Optimization for Implicit Advantage

arXiv:2604.08690v1 Announce Type: new
Abstract: Group Relative Policy Optimization (GRPO) has proven effective in RLVR by using outcome-based rewards. While fine-grained dense rewards can theoretically improve performance, we reveal that under practic…