cs.CV

All Roads Lead to Rome: Incentivizing Divergent Thinking in Vision-Language Models

arXiv:2604.00479v1 Announce Type: new
Abstract: Recent studies have demonstrated that Reinforcement Learning (RL), notably Group Relative Policy Optimization (GRPO), can intrinsically elicit and enhance the reasoning capabilities of Vision-Language Mo…