PLAF: Pixel-wise Language-Aligned Feature Extraction for Efficient 3D Scene Understanding
arXiv:2604.15770v2 Announce Type: replace
Abstract: Accurate open-vocabulary 3D scene understanding requires semantic representations that are both language-aligned and spatially precise at the pixel level, while remaining scalable when lifted to 3D s…