Unified Framework of Distributional Regret in Multi-Armed Bandits and Reinforcement Learning
arXiv:2605.05102v2 Announce Type: replace-cross
Abstract: We study the distribution of regret in stochastic multi-armed bandits and episodic reinforcement learning through a unified framework. We formalize a distributional regret bound as a probabilis…