Softmax gradient policy for variance minimization and risk-averse multi armed bandits
arXiv:2604.00241v1 Announce Type: cross
Abstract: Algorithms for the Multi-Armed Bandit (MAB) problem play a central role in sequential decision-making and have been extensively explored both theoretically and numerically. While most classical approac…