cs.IT, cs.LG, math.IT, math.OC, stat.ML

Optimal Variance-Dependent Regret Bounds for Infinite-Horizon MDPs

arXiv:2603.23926v1 Announce Type: new
Abstract: Online reinforcement learning in infinite-horizon Markov decision processes (MDPs) remains less theoretically and algorithmically developed than its episodic counterpart, with many algorithms suffering f…