Zifan He, Rui Ma, Yizhou Sun, Jason Cong

Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference

Zifan He, Rui Ma, Yizhou Sun, Jason Cong / April 1, 2026

arXiv:2603.29002v1 Announce Type: cross
Abstract: Modern large language models (LLMs) increasingly depends on efficient long-context processing and generation mechanisms, including sparse attention, retrieval-augmented generation (RAG), and compressed…

Author name: Zifan He, Rui Ma, Yizhou Sun, Jason Cong

Understand and Accelerate Memory Processing Pipeline for Disaggregated LLM Inference