PowerCLIP: Powerset Alignment for Contrastive Pre-Training
arXiv:2511.23170v4 Announce Type: replace
Abstract: Contrastive vision-language pre-training frameworks such as CLIP have demonstrated impressive zero-shot performance across a range of vision-language tasks. Recent studies have shown that aligning in…