Pure Exploration Beyond Reward Feedback: The Role of Post-Action Context
arXiv:2502.03061v2 Announce Type: replace
Abstract: We introduce the problem of best arm identification (BAI) with post-action context, a new BAI problem in a stochastic multi-armed bandit environment and the fixed-confidence setting. The problem addr…