cs.LG

VoxelCodeBench: Benchmarking 3D World Modeling Through Code Generation

arXiv:2604.02580v1 Announce Type: new
Abstract: Evaluating code generation models for 3D spatial reasoning requires executing generated code in realistic environments and assessing outputs beyond surface-level correctness. We introduce a platform Voxe…