cs.LG, math.ST, stat.ML, stat.TH

Vanishing L2 regularization for the softmax Multi Armed Bandit

arXiv:2605.03752v1 Announce Type: cross
Abstract: Multi Armed Bandit (MAB) algorithms are a cornerstone of reinforcement learning and have been studied both theoretically and numerically. One of the most commonly used implementation uses a softmax map…