Yun Xing, Hanyuan Liu, Jiahao Nie, Shijian Lu

Referring Multiple Regions with Large Multimodal Models via Contextual Latent Steering

Yun Xing, Hanyuan Liu, Jiahao Nie, Shijian Lu / May 5, 2026

arXiv:2605.01827v1 Announce Type: new
Abstract: Large Multimodal Models (LMMs) have recently demonstrated their proficiency in holistic visual comprehension. However, most of them struggle to tackle region-level perception guided by visual prompts, es…

Author name: Yun Xing, Hanyuan Liu, Jiahao Nie, Shijian Lu

Referring Multiple Regions with Large Multimodal Models via Contextual Latent Steering