Trong Thang Pham, Hien Nguyen, Ngan Le

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding

Trong Thang Pham, Hien Nguyen, Ngan Le / March 30, 2026

arXiv:2603.25841v1 Announce Type: new
Abstract: Current multimodal large language models (MLLMs) cannot effectively utilize eye-gaze information for video understanding, even when gaze cues are supplied via visual overlays or text descriptions. We int…

Author name: Trong Thang Pham, Hien Nguyen, Ngan Le

GazeQwen: Lightweight Gaze-Conditioned LLM Modulation for Streaming Video Understanding