cs.CV

CAST: Collapse-Aware multi-Scale Topology Fusion for Multimodal Coreset Selection

arXiv:2605.11705v1 Announce Type: new
Abstract: The training of large multimodal models fundamentally relies on massive image-text datasets, which inevitably incur prohibitive computational overhead. Dataset selection offers a promising paradigm by id…