cs.AI, cs.CV

Reducing Peak Memory Usage for Modern Multimodal Large Language Model Pipelines

arXiv:2604.16734v1 Announce Type: new
Abstract: Multimodal large language models (MLLMs) have recently demonstrated strong capabilities in understanding and generating responses from diverse visual inputs, including high-resolution images and long vid…