CLIP-RD: Relative Distillation for Efficient CLIP Knowledge Distillation
arXiv:2603.25383v3 Announce Type: replace
Abstract: CLIP aligns image and text embeddings via contrastive learning and demonstrates strong zero-shot generalization. Its large-scale architecture requires substantial computational and memory resources, …