ReVision: Scaling Computer-Use Agents via Temporal Visual Redundancy Reduction
arXiv:2605.11212v2 Announce Type: new
Abstract: Computer-use agents (CUAs) rely on visual observations of graphical user interfaces, where each screenshot is encoded into a large number of visual tokens. As interaction trajectories grow, the token cos…