Multimodal Contextualized Support for Enhancing Video Retrieval System
arXiv:2412.07584v2 Announce Type: replace
Abstract: Current video retrieval systems, especially those used in competitions, primarily focus on querying individual keyframes or images rather than encoding an entire clip or video segment. However, queri…