cs.AI, cs.CV

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

arXiv:2603.25841v1 Announce Type: new
Abstract: Current multimodal large language models (MLLMs) cannot effectively utilize eye-gaze information for video understanding, even when gaze cues are supplied via visual overlays or text descriptions. We int…