CoVSpec: Efficient Device-Edge Co-Inference for Vision-Language Models via Speculative Decoding
arXiv:2605.02218v1 Announce Type: new
Abstract: Vision-language models (VLMs) have demonstrated strong capabilities in multimodal perception and reasoning. However, deploying large VLMs on mobile devices remains challenging due to their substantial co…