Perceptual Flow Network for Visually Grounded Reasoning
arXiv:2605.02730v1 Announce Type: cross
Abstract: Despite the success of Large-Vision Language Models (LVLMs), general optimization objectives (e.g., standard MLE) fail to constrain visual trajectories, leading to language bias and hallucination. To m…