Learn to Think: Improving Multimodal Reasoning through Vision-Aware Self-Improvement Training
arXiv:2605.11931v1 Announce Type: new
Abstract: Post-training with explicit reasoning traces is common to improve the reasoning capabilities of Multimodal Large Language Models (MLLMs). However, acquiring high-quality reasoning traces is often costly …