cs.CV

Few Shots Text to Image Retrieval: New Benchmarking Dataset and Optimization Methods

arXiv:2603.25891v1 Announce Type: new
Abstract: Pre-trained vision-language models (VLMs) excel in multimodal tasks, commonly encoding images as embedding vectors for storage in databases and retrieval via approximate nearest neighbor search (ANNS). H…