cs.CV

Uncovering and Shaping the Latent Representation of 3D Scene Topology in Vision-Language Models

arXiv:2605.07148v1 Announce Type: new
Abstract: Decades of cognitive science establish that humans navigate environments by forming cognitive maps, defined as allocentric and topology-preserving representations of 3D space. While modern Vision-Languag…