cs.CL, cs.LG

Structured Recurrent Mixers for Massively Parallelized Sequence Generation

arXiv:2605.08696v1 Announce Type: new
Abstract: Over the last two decades, language modeling has experienced a shift from predominantly recurrent architectures that process tokens sequentially during training and inference to non-recurrent models that…