Any 3D Scene is Worth 1K Tokens: 3D-Grounded Representation for Scene Generation at Scale
arXiv:2604.11331v1 Announce Type: new
Abstract: 3D scene generation has long been dominated by 2D multi-view or video diffusion models. This is due not only to the lack of scene-level 3D latent representation, but also to the fact that most scene-leve…