Olmo Hybrid: From Theory to Practice and Back
arXiv:2604.03444v2 Announce Type: replace-cross
Abstract: Recent work has demonstrated the potential of non-transformer language models, especially linear recurrent neural networks (RNNs) and hybrid models that mix recurrence and attention. Yet there …