Towards Generalizable Reasoning: Group Causal Counterfactual Policy Optimization for LLM Reasoning
arXiv:2602.06475v2 Announce Type: replace
Abstract: Large language models (LLMs) excel at complex tasks with advances in reasoning capabilities. However, existing reward mechanisms remain tightly coupled to final correctness and pay little attention t…