Youssef Zaazou, Mark Thomas

Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs

Youssef Zaazou, Mark Thomas / May 13, 2026

arXiv:2605.11107v1 Announce Type: new
Abstract: Vision-language models (VLMs), such as CLIP and SigLIP 2, are widely used for image classification, yet their vision encoders remain vulnerable to systematic biases that undermine robustness. In particul…

Author name: Youssef Zaazou, Mark Thomas

Birds of a Feather Flock Together: Background-Invariant Representations via Linear Structure in VLMs