Steerable Visual Representations
arXiv:2604.02327v1 Announce Type: new
Abstract: Pretrained Vision Transformers (ViTs) such as DINOv2 and MAE provide generic image features that can be applied to a variety of downstream tasks such as retrieval, classification, and segmentation. Howev…