M$^2$RNN: Non-Linear RNNs with Matrix-Valued States for Scalable Language Modeling
arXiv:2603.14360v2 Announce Type: replace-cross
Abstract: Transformers are highly parallel but are limited to computations in the TC$^0$ complexity class, excluding tasks such as entity tracking and code execution that provably require greater express…