Transformers with Selective Access to Early Representations
arXiv:2605.03953v1 Announce Type: cross
Abstract: Several recent Transformer architectures expose later layers to representations computed in the earliest layers, motivated by the observation that low-level features can become harder to recover as the…