Sangwon Baik, Gunhee Kim, Mingi Choi, Hanbyul Joo

Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents

Sangwon Baik, Gunhee Kim, Mingi Choi, Hanbyul Joo / April 14, 2026

arXiv:2604.09781v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) exhibit strong visual reasoning capabilities, yet they still struggle with 3D understanding. In particular, VLMs often fail to infer a text-consistent goal 6D pose of a targ…

Author name: Sangwon Baik, Gunhee Kim, Mingi Choi, Hanbyul Joo

Text-Guided 6D Object Pose Rearrangement via Closed-Loop VLM Agents