cs.AI, cs.CL

VCORE: Variance-Controlled Optimization-based Reweighting for Chain-of-Thought Supervision

arXiv:2510.27462v2 Announce Type: replace
Abstract: Supervised fine-tuning (SFT) on long chain-of-thought (CoT) trajectories has emerged as a crucial technique for enhancing the reasoning abilities of large language models (LLMs). However, the standar…