Phalguni Nanda, Zaiwei Chen

A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies

Phalguni Nanda, Zaiwei Chen / April 7, 2026

arXiv:2510.16132v2 Announce Type: replace-cross
Abstract: In this work, we present the first finite-time analysis of Q-learning with time-varying learning policies (i.e., on-policy sampling) for discounted Markov decision processes under minimal assum…

Author name: Phalguni Nanda, Zaiwei Chen

A Minimal-Assumption Analysis of Q-Learning with Time-Varying Policies