cs.AI

GUIDE: Interpretable GUI Agent Evaluation via Hierarchical Diagnosis

arXiv:2604.04399v1 Announce Type: new
Abstract: Evaluating GUI agents presents a distinct challenge: trajectories are long, visually grounded, and open-ended, yet evaluation must be both accurate and interpretable. Existing approaches typically apply …