cs.CV

Object Referring-Guided Scanpath Prediction with Perception-Enhanced Vision-Language Models

arXiv:2604.20361v1 Announce Type: new
Abstract: Object Referring-guided Scanpath Prediction (ORSP) aims to predict the human attention scanpath when they search for a specific target object in a visual scene according to a linguistic description descr…