Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan

LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load

Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan / March 26, 2026

arXiv:2603.23640v1 Announce Type: cross
Abstract: Deploying large language models on-device for always-on personal agents demands sustained inference from hardware tightly constrained in power, thermal envelope, and memory. We benchmark Qwen 2.5 1.5B …

Author name: Pranay Tummalapalli, Sahil Arayakandy, Ritam Pal, Kautuk Kundan

LLM Inference at the Edge: Mobile, NPU, and GPU Performance Efficiency Trade-offs Under Sustained Load