Design Experiments to Compare Multi-armed Bandit Algorithms
arXiv:2603.05919v2 Announce Type: replace
Abstract: Online platforms routinely compare multi-armed bandit algorithms, such as UCB and Thompson Sampling, to select the best-performing policy. Unlike standard A/B tests for static treatments, each run of…