HybridKV: Hybrid KV Cache Compression for Efficient Multimodal Large Language Model Inference
arXiv:2604.05887v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) have advanced unified reasoning over text, images, and videos, but their inference is hindered by the rapid growth of key-value (KV) caches. Each visual input exp…