Operator-Theoretic Foundations and Policy Gradient Methods for General MDPs with Unbounded Costs
arXiv:2603.17875v3 Announce Type: replace
Abstract: Markov decision processes (MDPs) is viewed as an optimization of an objective function over certain linear operators over general function spaces. A new existence result is established for the existe…