DORA: A Scalable Asynchronous Reinforcement Learning System for Language Model Training
arXiv:2604.26256v1 Announce Type: new
Abstract: Reinforcement learning (RL) has become a critical paradigm for LLM post-training, yet the rollout phase — accounting for 50–80% of total step time — is bottlenecked by skewed generation: long-tailed t…