cs.AI, cs.LG, cs.NA, math.NA

Softmax gradient policy for variance minimization and risk-averse multi armed bandits

arXiv:2604.00241v1 Announce Type: cross
Abstract: Algorithms for the Multi-Armed Bandit (MAB) problem play a central role in sequential decision-making and have been extensively explored both theoretically and numerically. While most classical approac…