ViLL-E: Video LLM Embeddings for Retrieval
arXiv:2604.12148v1 Announce Type: new
Abstract: Video Large Language Models (VideoLLMs) excel at video understanding tasks where outputs are textual, such as Video Question Answering and Video Captioning. However, they underperform specialized embeddi…