GeoFlowVLM: Geometry-Aware Joint Uncertainty for Frozen Vision-Language Embedding
arXiv:2605.13352v1 Announce Type: new
Abstract: Standard dual-encoder vision-language models that map images and text to deterministic points on a shared unit hypersphere through $\ell_2$ normalization typically expose neither \emph{aleatoric} uncerta…