cs.AI, cs.CV

Domain-Invariant Prompt Learning for Vision-Language Models

arXiv:2603.28555v1 Announce Type: new
Abstract: Large pre-trained vision-language models like CLIP have transformed computer vision by aligning images and text in a shared feature space, enabling robust zero-shot transfer via prompting. Soft-prompting…