When RL Meets Adaptive Speculative Training: A Unified Training-Serving System
arXiv:2602.06932v2 Announce Type: replace
Abstract: Speculative decoding can significantly accelerate LLM serving, yet most deployments today disentangle speculator training from serving, treating speculator training as a standalone offline modeling p…