Guoqiang Zou, Wanyu Wang, Hao Zheng, Longxiang Yin, Yinhe Han

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

Guoqiang Zou, Wanyu Wang, Hao Zheng, Longxiang Yin, Yinhe Han / March 26, 2026

arXiv:2512.09427v3 Announce Type: replace-cross
Abstract: Existing memory management techniques severely hinder efficient Large Language Model serving on accelerators constrained by poor random-access bandwidth.While static pre-allocation preserves me…

Author name: Guoqiang Zou, Wanyu Wang, Hao Zheng, Longxiang Yin, Yinhe Han

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators