From Edges to Depth: Probing the Spatial Hierarchy in Vision Transformers
arXiv:2604.23452v1 Announce Type: new
Abstract: Vision Transformers trained only on image classification routinely transfer to tasks that demand spatial understanding, yet they receive no spatial supervision during pretraining. We ask where and how ro…