See, Point, Refine: Multi-Turn Approach to GUI Grounding with Visual Feedback
arXiv:2604.13019v1 Announce Type: new
Abstract: Computer Use Agents (CUAs) fundamentally rely on graphical user interface (GUI) grounding to translate language instructions into executable screen actions, but editing-level grounding in dense coding in…