cs.LG, math.PR, stat.ML

From Set Convergence to Pointwise Convergence: Finite-Time Guarantees for Average-Reward Q-Learning with Adaptive Stepsizes

arXiv:2504.18743v2 Announce Type: replace-cross
Abstract: This work presents the first finite-time analysis for the last-iterate convergence of average-reward $Q$-learning with an asynchronous implementation. A key feature of the algorithm we study is…