Jian Xiong, Jingbo Zhou, Jingyong Ye, Qiang Huang, Dejing Dou

AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin

Jian Xiong, Jingbo Zhou, Jingyong Ye, Qiang Huang, Dejing Dou / April 15, 2026

arXiv:2505.14264v3 Announce Type: replace
Abstract: Reinforcement learning (RL) has emerged as an effective approach for enhancing the reasoning capabilities of large language models (LLMs), especially in scenarios where supervised fine-tuning (SFT) f…

Author name: Jian Xiong, Jingbo Zhou, Jingyong Ye, Qiang Huang, Dejing Dou

AAPO: Enhancing the Reasoning Capabilities of LLMs with Advantage Margin