Frictional Q-Learning
arXiv:2509.19771v4 Announce Type: replace-cross
Abstract: Off-policy reinforcement learning suffers from extrapolation errors when a learned policy selects actions that are weakly supported in the replay buffer. In this study, we address this issue by…