OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
arXiv:2511.14582v2 Announce Type: replace
Abstract: Omnimodal large language models (OmniLLMs) have attracted increasing research attention of late towards unified audio-video understanding. However, the high computational cost of processing longer jo…