cs.CL, cs.LG

Self-Induced Outcome Potential: Turn-Level Credit Assignment for Agents without Verifiers

arXiv:2605.04984v1 Announce Type: new
Abstract: Long-horizon LLM agents depend on intermediate information-gathering turns, yet training feedback is usually observed only at the final answer, because process-level rewards require high-quality human an…