From Reasoning to Agentic: Credit Assignment in Reinforcement Learning for Large Language Models
arXiv:2604.09459v2 Announce Type: replace
Abstract: Reinforcement learning (RL) for large language models (LLMs) increasingly relies on sparse, outcome-level rewards — yet determining which actions within a long trajectory caused the outcome remains …