Adaptive Q-Chunking for Offline-to-Online Reinforcement Learning
arXiv:2605.05544v1 Announce Type: cross
Abstract: Offline-to-online reinforcement learning with action chunking eliminates multi-step off-policy bias and enables temporally coherent exploration, but all existing methods use a fixed chunk size across e…