cs.LG

Bridging the Gap Between Average and Discounted TD Learning

arXiv:2605.02103v1 Announce Type: new
Abstract: The analysis of Temporal Difference (TD) learning in the average-reward setting faces notable theoretical difficulties because the Bellman operator is not contractive with respect to any norm. This compl…