Wenlong Ji, Yihan Pan, Ruihao Zhu, Lihua Lei

Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards

Wenlong Ji, Yihan Pan, Ruihao Zhu, Lihua Lei / April 23, 2026

arXiv:2506.16658v2 Announce Type: replace-cross
Abstract: Multi-armed bandit (MAB) is a widely adopted framework for sequential decision-making under uncertainty. Traditional bandit algorithms rely solely on online data, which tends to be scarce as it…

Author name: Wenlong Ji, Yihan Pan, Ruihao Zhu, Lihua Lei

Multi-Armed Bandits With Machine Learning-Generated Surrogate Rewards