Lost in Space? Vision-Language Models Struggle with Relative Camera Pose Estimation
arXiv:2601.22228v2 Announce Type: replace-cross
Abstract: We study whether vision-language models (VLMs) can solve relative camera pose estimation (RCPE) from image pairs, a direct test of multi-view spatial reasoning. We cast RCPE as a discrete verba…