cs.AI, cs.CV

Efficient Encoder-Free Fourier-based 3D Large Multimodal Model

arXiv:2602.23153v2 Announce Type: replace
Abstract: Large Multimodal Models (LMMs) that process 3D data typically rely on heavy, pre-trained visual encoders to extract geometric features. While recent 2D LMMs have begun to eliminate such encoders for …