cs.CV, cs.LG, cs.NA, math.NA, stat.ML

Linearized Attention Cannot Enter the Kernel Regime at Any Practical Width

arXiv:2603.13085v2 Announce Type: replace-cross
Abstract: Understanding whether attention mechanisms converge to the kernel regime is foundational to the validity of influence functions for transformer accountability. Exact NTK characterization of sof…