Absorber LLM: Harnessing Causal Synchronization for Test-Time Training
arXiv:2604.20915v1 Announce Type: cross
Abstract: Transformers suffer from a high computational cost that grows with sequence length for self-attention, making inference in long streams prohibited by memory consumption. Constant-memory alternatives su…