Generative Modeling of Discrete Data Using Geometric Latent Subspaces

arXiv:2601.21831v2 Announce Type: replace Abstract: We propose a geometric latent-subspace framework for generative modeling of discrete data. Specifically, we introduce latent subspaces in the exponential parameter space of product manifolds of categorical distributions as a novel method for learning generative models of discrete data. The resulting low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorical variables. We equip the parameter domain with a Riemannian geometry such that the latent subspace and induced data manifold are related by isometries enabling consistent flow matching. Exploiting this structure, we propose a geometry-aware dimensionality reduction objective, called geometric PCA (GPCA), which we formulate as a regularized cross-entropy minimization that encourages small Riemannian distances between the data and their reconstructions. In particular, under the induced geometry, geodesics become straight lines in the latent parameter space which makes model training by flow matching effective. Empirical results show that low-dimensional latent representations suffice to accurately model high-dimensional discrete data.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top