Ziqi Wen, Parsa Madinei, Miguel P. Eckstein

Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency

Ziqi Wen, Parsa Madinei, Miguel P. Eckstein / May 14, 2026

arXiv:2605.13047v1 Announce Type: cross
Abstract: Evaluating whether large vision-language models (VLMs) align with human perception for high-level semantic scene comprehension remains a challenge. Traditional white-box interpretability methods are in…

Author name: Ziqi Wen, Parsa Madinei, Miguel P. Eckstein

Revealing the Gap in Human and VLM Scene Perception through Counterfactual Semantic Saliency