Yiming Bian, Joshua M. Akey

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling

Yiming Bian, Joshua M. Akey / April 23, 2026

arXiv:2604.20819v1 Announce Type: new
Abstract: The scalability of long-context large language models is fundamentally limited by the quadratic memory cost of exact self-attention, which often leads to out-of-memory (OOM) failures on modern hardware. …

Author name: Yiming Bian, Joshua M. Akey

Stream-CQSA: Avoiding Out-of-Memory in Attention Computation via Flexible Workload Scheduling