OpenVLThinkerV2: A Generalist Multimodal Reasoning Model for Multi-domain Visual Tasks
arXiv:2604.08539v2 Announce Type: replace-cross
Abstract: Group Relative Policy Optimization (GRPO) has emerged as the de facto Reinforcement Learning (RL) objective driving recent advancements in Multimodal Large Language Models. However, extending t…