cs.LG

AdamO: A Collapse-Suppressed Optimizer for Offline RL

arXiv:2605.01968v1 Announce Type: new
Abstract: Offline reinforcement learning (RL) can fail spectacularly when bootstrapped temporal-difference (TD) updates amplify their own errors, driving the critic toward extreme and unusable Q-values. A key coun…