From Web to Pixels: Bringing Agentic Search into Visual Perception
arXiv:2605.12497v1 Announce Type: new
Abstract: Visual perception connects high-level semantic understanding to pixel-level perception, but most existing settings assume that the decisive evidence for identifying a target is already in the image or fr…