ShadowNPU: System and Algorithm Co-design for NPU-Centric On-Device LLM Inference
arXiv:2508.16703v4 Announce Type: replace-cross
Abstract: On-device running Large Language Models (LLMs) is nowadays a critical enabler towards preserving user privacy. We observe that the attention operator falls back from the special-purpose NPU to …