cs.LG, math.OC

Learning Weakly Communicating Average-Reward CMDPs: Strong Duality and Improved Regret

arXiv:2605.11586v1 Announce Type: new
Abstract: We study infinite-horizon average-reward constrained Markov decision processes (CMDPs) under the weakly communicating assumption. Our contributions are twofold. First, we establish strong duality for wea…