cs.CV

Exploring Cross-Modal Flows for Few-Shot Learning

arXiv:2510.14543v4 Announce Type: replace
Abstract: Aligning features from different modalities, is one of the most fundamental challenges for cross-modal tasks. Although pre-trained vision-language models can achieve a general alignment between image…