cs.AI, cs.CL, cs.CV, cs.HC, cs.LG

GUI-AIMA: Aligning Intrinsic Multimodal Attention with a Context Anchor for GUI Grounding

arXiv:2511.00810v3 Announce Type: replace
Abstract: Graphical user interface (GUI) grounding is a key capability for computer-use agents, mapping natural-language instructions to actionable regions on the screen. Existing Multimodal Large Language Mod…