Concave Statistical Utility Maximization Bandits via Influence-Function Gradients
arXiv:2604.22140v1 Announce Type: new
Abstract: We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we sh…