cs.CL, cs.CV

GeoAlign: Geometric Feature Realignment for MLLM Spatial Reasoning

arXiv:2604.12630v1 Announce Type: new
Abstract: Multimodal large language models (MLLMs) have exhibited remarkable performance in various visual tasks, yet still struggle with spatial reasoning. Recent efforts mitigate this by injecting geometric feat…