Unichain and Aperiodicity are Sufficient for Asymptotic Optimality of Average-Reward Restless Bandits
arXiv:2402.05689v4 Announce Type: replace
Abstract: We consider the infinite-horizon, average-reward restless bandit problem in discrete time. We propose a new class of policies that are designed to drive a progressively larger subset of arms toward t…