Continuous-time reinforcement learning: ellipticity enables model-free value function approximation
arXiv:2602.06930v2 Announce Type: replace-cross
Abstract: We study off-policy reinforcement learning for controlling continuous-time Markov diffusion processes with discrete-time observations and actions. We consider model-free algorithms with functio…