EvoGround: Self-Evolving Video Agents for Video Temporal Grounding
arXiv:2605.13803v1 Announce Type: new
Abstract: Video temporal grounding (VTG) takes an untrimmed video and a natural-language query as input and localizes the temporal moment that best matches the query. Existing methods rely on large, task-specific …