cs.LG, cs.SY, eess.SY

An LP-based Sampling Policy for Multi-Armed Bandits with Side-Observations and Stochastic Availability

arXiv:2603.26647v1 Announce Type: new
Abstract: We study the stochastic multi-armed bandit (MAB) problem where an underlying network structure enables side-observations across related actions. We use a bipartite graph to link actions to a set of unkno…