Fast SceneScript: Fast and Accurate Language-Based 3D Scene Understanding via Multi-Token Prediction
arXiv:2512.05597v3 Announce Type: replace
Abstract: Recent perception-generalist approaches based on language models have achieved state-of-the-art results across diverse tasks, including 3D scene layout estimation and 3D object detection, via unified…