Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Quanquan Gu

cs.AI, cs.IT, cs.LG, math.IT, math.ST, stat.ML, stat.TH

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability

Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Quanquan Gu / May 12, 2026

arXiv:2605.09214v1 Announce Type: cross
Abstract: \emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $\epsilon^{-1}$-type …

Author name: Qingyue Zhao, Kaixuan Ji, Heyang Zhao, Quanquan Gu

Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability