cs.LG, math.ST, stat.AP, stat.ML, stat.TH

Concave Statistical Utility Maximization Bandits via Influence-Function Gradients

arXiv:2604.22140v1 Announce Type: new
Abstract: We study stochastic multi-armed bandits in which the objective is a statistical functional of the long-run reward distribution, rather than expected reward alone. Under mild continuity assumptions, we sh…