Linearized Attention Cannot Enter the Kernel Regime at Any Practical Width

arXiv:2603.13085v2 Announce Type: replace-cross Abstract: Understanding whether attention mechanisms converge to the kernel regime is foundational to the validity of influence functions for transformer accountability. Exact NTK characterization of softmax attention is precluded by its exponential nonlinearity; linearized attention is the canonical tractable proxy and the object of study here. This paper establishes that even this proxy does not converge to its NTK limit at any practical width, revealing a fundamental trade-off in the learning dynamics of attention. An exact correspondence is established between parameter-free linearized attention and a data-dependent Gram-induced kernel; spectral amplification analysis shows that the attention transformation cubes the Gram matrix's condition number, requiring width $m = \Omega(\kappa_d(\mathbf{G})^6 n\log n)$ for NTK convergence, where $\kappa_d(\mathbf{G})$ is the effective condition number of the rank-$\min(n,d)$ truncation of the input Gram matrix; for natural image datasets this threshold is physically infeasible ($m \gg 10^{24}$ for MNIST and $m \gg 10^{29}$ for CIFAR-10, 12--17 orders of magnitude beyond the largest known architectures). \emph{Influence malleability} is introduced to characterize this non-convergence: linearized attention exhibits 2--9$\times$ higher malleability than ReLU networks under adversarial data perturbation, with the gap depending on dataset condition number and task setting. A dual implication is established: the same data-dependent kernel is shown theoretically to reduce approximation error when targets align with the data geometry, while, empirically, creating vulnerability to adversarial manipulation of the training data. The structural argument extends to trainable QKV attention under standard initialization, with direct consequences for influence methods applied to deployed transformer architectures.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top