Adaptive Symmetrization of the KL Divergence

arXiv:2511.11159v3 Announce Type: replace Abstract: The forward Kullback-Leibler (KL) divergence is a ubiquitous objective for fitting a parameterized distribution to samples due to its tractability and equivalence to maximum likelihood estimation (MLE). Its inherent asymmetry, however, may lead to degenerate solutions that generalize poorly. While the symmetric Jeffreys divergence offers a more balanced alternative, its optimization is challenging due to the presence of a reverse KL term. Generative adversarial networks (GANs) bypass this intractability using a min-max formulation at the cost of introducing new instability issues. This work proposes a non-adversarial approach to minimize the Jeffreys divergence. To do so, it uses a proxy model to tractably approximate the reverse KL divergence of the main model. The main and proxy models are jointly fitted to the data using a constrained optimization formulation to obtain a practical algorithm that adapts the models' priorities throughout training. We evaluate our framework on various tasks, including density estimation and simulation-based inference, and demonstrate that this approach is more stable and more accurate than MLE and GANs, particularly in low-data regimes.

Leave a Comment