Rui Ai, Yu Pan, David Simchi-Levi, Chonghuan Wang

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training

Rui Ai, Yu Pan, David Simchi-Levi, Chonghuan Wang / April 1, 2026

arXiv:2603.29871v1 Announce Type: new
Abstract: In user-agent interaction scenarios such as recommendation, brainstorming, and code suggestion, Large Language Models (LLMs) often generate sets of candidate recommendations where the objective is to max…

Author name: Rui Ai, Yu Pan, David Simchi-Levi, Chonghuan Wang

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training