CLIP-SVD: Efficient and Interpretable Vision-Language Adaptation via Singular Values
arXiv:2509.03740v3 Announce Type: replace
Abstract: Vision-language models (VLMs) like CLIP have shown impressive zero-shot and few-shot learning capabilities across diverse applications. However, adapting these models to new fine-grained domains rema…