cs.GT, cs.LG

Beyond Pessimism: Offline Learning in KL-regularized Games

arXiv:2604.06738v1 Announce Type: cross
Abstract: We study offline learning in KL-regularized two-player zero-sum games, where policies are optimized under a KL constraint to a fixed reference policy. Prior work relies on pessimistic value estimation …