Thompson Sampling for Infinite-Horizon Discounted Decision Processes
arXiv:2405.08253v3 Announce Type: replace
Abstract: This paper develops a viable notion of learning for sampling-based algorithms that applies in broader settings than previously considered. More specifically, we model a discounted infinite-horizon MD…