cs.LG

Policy Improvement Reinforcement Learning

arXiv:2604.00860v1 Announce Type: new
Abstract: Reinforcement Learning with Verifiable Rewards (RLVR) has become a central post-training paradigm for improving the reasoning capabilities of large language models. Yet existing methods share a common bl…

cs.LG

SkillRouter: Skill Routing for LLM Agents at Scale

arXiv:2603.22455v4 Announce Type: replace
Abstract: Reusable skills let LLM agents package task-specific procedures, tool affordances, and execution guidance into modular building blocks. As skill ecosystems grow to tens of thousands of entries, expos…

Scroll to Top