Vanishing L2 regularization for the softmax Multi Armed Bandit
arXiv:2605.03752v1 Announce Type: cross
Abstract: Multi Armed Bandit (MAB) algorithms are a cornerstone of reinforcement learning and have been studied both theoretically and numerically. One of the most commonly used implementation uses a softmax map…