cs.CV

SeGPruner: Semantic-Geometric Visual Token Pruner for 3D Question Answering

arXiv:2603.29437v1 Announce Type: new
Abstract: Vision-language models (VLMs) have been widely adopted for 3D question answering (3D QA). In typical pipelines, visual tokens extracted from multiple viewpoints are concatenated with language tokens and …