cs.CL, cs.LG

FP8-RL: A Practical and Stable Low-Precision Stack for LLM Reinforcement Learning

arXiv:2601.18150v2 Announce Type: replace
Abstract: Reinforcement learning (RL) for large language models (LLMs) is increasingly bottlenecked by rollout (generation), where long output sequence lengths make attention and KV-cache memory dominate end-t…