A Two-Timescale Primal-Dual Framework for Reinforcement Learning via Online Dual Variable Guidance
arXiv:2505.04494v3 Announce Type: replace-cross
Abstract: We study reinforcement learning by combining recent advances in regularized linear programming formulations with the classical theory of stochastic approximation. Motivated by the challenge of …