From Restless to Contextual: A Thresholding Bandit Reformulation For Finite-horizon Improvement
arXiv:2502.05145v5 Announce Type: replace
Abstract: This paper addresses the poor finite-horizon performance of existing online \emph{restless bandit} (RB) algorithms, which stems from the prohibitive sample complexity of learning a full \emph{Markov …