Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling
arXiv:2507.07982v2 Announce Type: replace-cross
Abstract: Videos inherently represent 2D projections of a dynamic 3D world. However, our analysis suggests that video diffusion models trained solely on raw video data often fail to capture meaningful ge…