Augmented Lagrangian Method for Last-Iterate Convergence for Constrained MDPs
arXiv:2605.11694v1 Announce Type: new
Abstract: We study policy optimization for infinite-horizon, discounted constrained Markov decision processes (CMDPs). While existing theoretical guarantees typically hold for the mixture policy, deploying such a …