Stefana-Lucia Anita, Gabriel Turinici

Vanishing L2 regularization for the softmax Multi Armed Bandit

Stefana-Lucia Anita, Gabriel Turinici / May 6, 2026

arXiv:2605.03752v1 Announce Type: cross
Abstract: Multi Armed Bandit (MAB) algorithms are a cornerstone of reinforcement learning and have been studied both theoretically and numerically. One of the most commonly used implementation uses a softmax map…

Author name: Stefana-Lucia Anita, Gabriel Turinici

Vanishing L2 regularization for the softmax Multi Armed Bandit