Interpretable Zero-shot Referring Expression Comprehension with Query-driven Scene Graphs
arXiv:2603.25004v1 Announce Type: new
Abstract: Zero-shot referring expression comprehension (REC) aims to locate target objects in images given natural language queries without relying on task-specific training data, demanding strong visual understan…