Reducing Peak Memory Usage for Modern Multimodal Large Language Model Pipelines
arXiv:2604.16734v1 Announce Type: new
Abstract: Multimodal large language models (MLLMs) have recently demonstrated strong capabilities in understanding and generating responses from diverse visual inputs, including high-resolution images and long vid…