cs.AI, cs.SE

Schedule-and-Calibrate: Utility-Guided Multi-Task Reinforcement Learning for Code LLMs

arXiv:2605.06111v1 Announce Type: cross
Abstract: Reinforcement learning (RL) with verifiable rewards has proven effective at post-training LLMs for coding, yet deploying separate task-specific specialists incurs costs that scale with the number of ta…