TARS: MinMax Token-Adaptive Preference Strategy for Hallucination Reduction in MLLMs
arXiv:2507.21584v4 Announce Type: replace
Abstract: Multimodal large language models (MLLMs) are prone to hallucinations, generating plausible but visually ungrounded outputs, partly because direct preference optimization (DPO) overfits to superficial…