cs.AI, cs.CV, cs.LG

Accelerating Vision Transformers with Adaptive Patch Sizes

arXiv:2510.18091v2 Announce Type: replace
Abstract: Vision Transformers (ViTs) partition input images into uniformly sized patches regardless of their content, resulting in long input sequence lengths for high-resolution images. We present Adaptive Pa…