cs.LG, math.OC, stat.ML

A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies

arXiv:2510.16132v2 Announce Type: replace-cross
Abstract: In this work, we present the first finite-time analysis of Q-learning with time-varying learning policies (i.e., on-policy sampling) for discounted Markov decision processes under minimal assum…