ProMMSearchAgent: A Generalizable Multimodal Search Agent Trained with Process-Oriented Rewards
arXiv:2604.20486v1 Announce Type: new
Abstract: Training multimodal agents via reinforcement learning for knowledge-intensive visual reasoning is fundamentally hindered by the extreme sparsity of outcome-based supervision and the unpredictability of l…