GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding
arXiv:2511.00810v3 Announce Type: replace
Abstract: Graphical user interface (GUI) grounding is a key capability for computer-use agents, mapping natural-language instructions to actionable regions on the screen. Existing Multimodal Large Language Mod…