cs.CL, cs.LG

AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin

arXiv:2505.14264v3 Announce Type: replace
Abstract: Reinforcement learning (RL) has emerged as an effective approach for enhancing the reasoning capabilities of large language models (LLMs), especially in scenarios where supervised fine-tuning (SFT) f…