cs.CV

KiToke: Kernel-based Interval-aware Token Compression for Video Large Language Models

arXiv:2604.03414v1 Announce Type: new
Abstract: Video Large Language Models (Video LLMs) achieve strong performance on video understanding tasks but suffer from high inference costs due to the large number of visual tokens. We propose KiToke, a traini…