Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards
arXiv:2506.16658v2 Announce Type: replace-cross
Abstract: Multi-armed bandit (MAB) is a widely adopted framework for sequential decision-making under uncertainty. Traditional bandit algorithms rely solely on online data, which tends to be scarce as it…