R-C2: Cycle-Consistent Reinforcement Learning Improves Multimodal Reasoning
arXiv:2603.25720v1 Announce Type: cross
Abstract: Robust perception and reasoning require consistency across sensory modalities. Yet current multimodal models often violate this principle, yielding contradictory predictions for visual and textual repr…