cs.AI, cs.CV

CLAY: Conditional Visual Similarity Modulation in Vision-Language Embedding Space

arXiv:2604.11539v1 Announce Type: cross
Abstract: Human perception of visual similarity is inherently adaptive and subjective, depending on the users’ interests and focus. However, most image retrieval systems fail to reflect this flexibility, relying…