Accelerating Vision Transformers with Adaptive Patch Sizes
arXiv:2510.18091v2 Announce Type: replace
Abstract: Vision Transformers (ViTs) partition input images into uniformly sized patches regardless of their content, resulting in long input sequence lengths for high-resolution images. We present Adaptive Pa…