cs.LG, stat.ML

Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

arXiv:2605.03921v1 Announce Type: cross
Abstract: We study the $(\varepsilon, \delta)$-PAC policy identification problem in finite-horizon episodic Markov Decision Processes. Existing approaches provide finite-time guarantees for approximate settings …