2Mamba2Furious: Linear in Complexity, Competitive in Accuracy
arXiv:2602.17363v3 Announce Type: replace
Abstract: Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy comp…