Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward

Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao / May 13, 2026

arXiv:2605.12495v1 Announce Type: new
Abstract: In this paper, we propose AlphaGRPO, a novel framework that applies Group Relative Policy Optimization (GRPO) to AR-Diffusion Unified Multimodal Models (UMMs) to enhance multimodal generation capabilitie…

Author name: Runhui Huang, Jie Wu, Rui Yang, Zhe Liu, Hengshuang Zhao

AlphaGRPO: Unlocking Self-Reflective Multimodal Generation in UMMs via Decompositional Verifiable Reward