cs.AI, cs.AR

ODMA: On-Demand Memory Allocation Strategy for LLM Serving on LPDDR-Class Accelerators

arXiv:2512.09427v3 Announce Type: replace-cross
Abstract: Existing memory management techniques severely hinder efficient Large Language Model serving on accelerators constrained by poor random-access bandwidth.While static pre-allocation preserves me…