When Does Content-Based Routing Work? Representation Requirements for Selective Attention in Hybrid Sequence Models
arXiv:2603.20997v2 Announce Type: replace
Abstract: We identify a routing paradox in hybrid sequence models: content-based routing – deciding which tokens deserve expensive attention – requires pairwise computation, and this requirement is inescapable…