Author name: Haibo Wang, Zihao Lin, Zhiyang Xu, Lifu Huang

Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

Haibo Wang, Zihao Lin, Zhiyang Xu, Lifu Huang / April 3, 2026

arXiv:2604.00528v2 Announce Type: replace
Abstract: 3D Visual Grounding (3D-VG) aims to localize objects in 3D scenes via natural language descriptions. While recent advancements leveraging Vision-Language Models (VLMs) have explored zero-shot possibi…

cs.AI, cs.CV

Think, Act, Build: An Agentic Framework with Vision Language Models for Zero-Shot 3D Visual Grounding

Haibo Wang, Zihao Lin, Zhiyang Xu, Lifu Huang / April 2, 2026

arXiv:2604.00528v1 Announce Type: new
Abstract: 3D Visual Grounding (3D-VG) aims to localize objects in 3D scenes via natural language descriptions. While recent advancements leveraging Vision-Language Models (VLMs) have explored zero-shot possibiliti…