cs.AI, cs.LG, math.ST, stat.ML, stat.TH

On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization

arXiv:2605.02141v1 Announce Type: cross
Abstract: Kullback-Leibler (KL) regularization is widely used in offline decision-making and offers several benefits, motivating recent work on the sample complexity of offline learning with respect to KL-regula…