Global Optimality for Constrained Exploration via Penalty Regularization
arXiv:2604.28144v1 Announce Type: new
Abstract: Efficient exploration is a central problem in reinforcement learning and is often formalized as maximizing the entropy of the state-action occupancy measure. While unconstrained maximum-entropy explorati…