VG-CoT: Towards Trustworthy Visual Reasoning via Grounded Chain-of-Thought
arXiv:2604.21396v1 Announce Type: new
Abstract: The advancement of Large Vision-Language Models (LVLMs) requires precise local region-based reasoning that faithfully grounds the model’s logic in actual visual evidence. However, existing datasets face …