A Paradigm Shift: Fully End-to-End Training for Temporal Sentence Grounding in Videos
arXiv:2604.02860v1 Announce Type: new
Abstract: Temporal sentence grounding in videos (TSGV) aims to localize a temporal segment that semantically corresponds to a sentence query from an untrimmed video. Most current methods adopt pre-trained query-ag…