DOSE: Data Selection for Multi-Modal LLMs via Off-the-Shelf Models
arXiv:2604.16979v1 Announce Type: new
Abstract: High-quality and diverse multimodal data are essential for improving vision-language models (VLMs), yet existing datasets often contain noisy, redundant, and poorly aligned samples. To address these prob…