cs.CV

Let Geometry GUIDE: Layer-wise Unrolling of Geometric Priors in Multimodal LLMs

arXiv:2604.05695v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) have achieved remarkable progress in 2D visual tasks but still exhibit limited physical spatial awareness when processing real-world visual streams. Recently, fee…