cs.CV

When Negation Is a Geometry Problem in Vision-Language Models

arXiv:2603.20554v2 Announce Type: replace
Abstract: Joint Vision-Language Embedding models such as CLIP typically fail at understanding negation in text queries, for example, failing to distinguish “no” in the query: “a plain blue shirt with no logos”…