cs.LG

Trading off rewards and errors in multi-armed bandits

arXiv:2605.00488v1 Announce Type: new
Abstract: In multi-armed bandits, the most-explored arms are the most informative, while reward maximization typically pulls only the best arm. We study the tradeoff between identifying arm means accurately and ac…