cs.CV

EMCompress: Video-LLMs with Endomorphic Multimodal Compression

arXiv:2508.21094v3 Announce Type: replace
Abstract: Video-LLMs face a fundamental tension in long-video reasoning: static, sparse frame sampling either dilutes evidence across task-irrelevant segments at significant cost or misses fine-grained tempora…