ARTA: Adaptive Mixed-Resolution Token Allocation for Efficient Dense Feature Extraction
arXiv:2603.26258v1 Announce Type: new
Abstract: We present ARTA, a mixed-resolution coarse-to-fine vision transformer for efficient dense feature extraction. Unlike models that begin with dense high-resolution (fine) tokens, ARTA starts with low-resol…