Victor Norgren - Provide.ai

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers

Victor Norgren / May 14, 2026

arXiv:2605.13784v1 Announce Type: new
Abstract: Conventional transformer inference engines are request-driven, paying an O(n) prefill cost on every query. In streaming workloads, where data arrives continuously and queries probe an ever-growing contex…

Author name: Victor Norgren

Attention Once Is All You Need: Efficient Streaming Inference with Stateful Transformers