cs.AI, cs.CV

LOOPE: Learnable Optimal Patch Order in Positional Embeddings for Vision Transformers

arXiv:2504.14386v2 Announce Type: replace-cross
Abstract: Positional embeddings (PE) play a crucial role in Vision Transformers (ViTs) by providing spatial information otherwise lost due to the permutation invariant nature of self attention. While abs…