Fast Rates for Offline Contextual Bandits with Forward-KL Regularization under Single-Policy Concentrability
arXiv:2605.09214v1 Announce Type: cross
Abstract: \emph{Kullback-Leibler} (KL) regularization is ubiquitous in reinforcement learning algorithms in the form of \emph{reverse} or \emph{forward} KL. Recent studies have demonstrated $\epsilon^{-1}$-type …