cs.AI

ShapE-GRPO: Shapley-Enhanced Reward Allocation for Multi-Candidate LLM Training

arXiv:2603.29871v1 Announce Type: new
Abstract: In user-agent interaction scenarios such as recommendation, brainstorming, and code suggestion, Large Language Models (LLMs) often generate sets of candidate recommendations where the objective is to max…