On the Optimal Sample Complexity of Offline Multi-Armed Bandits with KL Regularization
arXiv:2605.02141v1 Announce Type: cross
Abstract: Kullback-Leibler (KL) regularization is widely used in offline decision-making and offers several benefits, motivating recent work on the sample complexity of offline learning with respect to KL-regula…