Restless Bandits with Individual Penalty Constraints: A New Near-Optimal Index Policy and How to Learn It
arXiv:2604.04101v1 Announce Type: new
Abstract: This paper investigates the Restless Multi-Armed Bandit (RMAB) framework under individual penalty constraints to address resource allocation challenges in dynamic wireless networked environments. Unlike …