MAG-3D: Multi-Agent Grounded Reasoning for 3D Understanding
arXiv:2604.09167v1 Announce Type: new
Abstract: Vision-language models (VLMs) have achieved strong performance in multimodal understanding and reasoning, yet grounded reasoning in 3D scenes remains underexplored. Effective 3D reasoning hinges on accur…