S2H-DPO: Hardness-Aware Preference Optimization for Vision-Language Models
arXiv:2604.18512v1 Announce Type: new
Abstract: Vision-Language Models (VLMs) have demonstrated remarkable progress in single-image understanding, yet effective reasoning across multiple images remains challenging. We identify a critical capability ga…