Spark3R: Asymmetric Token Reduction Makes Fast Feed-Forward 3D Reconstruction
arXiv:2605.06270v1 Announce Type: new
Abstract: Feed-forward 3D reconstruction models based on Vision Transformers can directly estimate scene geometry and camera poses from a small set of input images, but scaling them to video inputs with hundreds o…