cs.AI, cs.CL, cs.CV, cs.LG

Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining

arXiv:2605.14747v1 Announce Type: cross
Abstract: Recent advances in multimodal large language models have driven growing interest in graphical user interface (GUI) agents, yet their generalization remains constrained by the scarcity of large-scale tr…