cs.CL, cs.CV, cs.LG

Uncertainty-Aware Exploratory Direct Preference Optimization for Multimodal Large Language Models

arXiv:2605.04874v1 Announce Type: cross
Abstract: Direct Preference Optimization (DPO) has proven to be an effective solution for mitigating hallucination in Multimodal Large Language Models (MLLMs) by learning from preference pairs. One of its key ch…