cs.AI, cs.LG

Adaptive Replay Buffer for Offline-to-Online Reinforcement Learning

arXiv:2512.10510v2 Announce Type: replace-cross
Abstract: Offline-to-Online Reinforcement Learning (O2O RL) faces a critical dilemma in balancing the use of a fixed offline dataset with newly collected online experiences. Standard methods, often relyi…