Self-supervised pretraining for an iterative image size agnostic vision transformer
arXiv:2604.20392v1 Announce Type: new
Abstract: Vision Transformers (ViTs) dominate self-supervised learning (SSL). While they have proven highly effective for large-scale pretraining, they are computationally inefficient and scale poorly with image s…