cs.CV

ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

arXiv:2602.14276v2 Announce Type: replace
Abstract: Modern computer-use agents (CUA) must perceive a screen as a structured state, what elements are visible, where they are, and what text they contain, before they can reliably ground instructions and …