Explaining CLIP Zero-shot Predictions Through Concepts
arXiv:2603.28211v1 Announce Type: new
Abstract: Large-scale vision-language models such as CLIP have achieved remarkable success in zero-shot image recognition, yet their predictions remain largely opaque to human understanding. In contrast, Concept B…