cs.AI, cs.CL

Evidence-Augmented Policy Optimization with Reward Co-Evolution for Long-Context Reasoning

arXiv:2601.10306v2 Announce Type: replace-cross
Abstract: While Reinforcement Learning (RL) has advanced LLM reasoning, applying it to long-context scenarios is hindered by sparsity of outcome rewards. This limitation fails to penalize ungrounded “luc…