cs.CL

Beyond Pixels: Introspective and Interactive Grounding for Visualization Agents

arXiv:2604.21134v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) frequently misread values, hallucinate details, and confuse overlapping elements in charts. Current approaches rely solely on pixel interpretation, creating a Pixel-Only Bot…