Q-Learning with Shift-Aware Upper Confidence Bound in Non-Stationary Reinforcement Learning
arXiv:2510.03181v2 Announce Type: replace
Abstract: We study the Non-Stationary Reinforcement Learning (RL) under distribution shifts in both finite-horizon episodic and infinite-horizon discounted Markov Decision Processes (MDPs). In the finite-horiz…