cs.AI, cs.CV, cs.LG, cs.MA, cs.MM

LVRPO: Language-Visual Alignment with GRPO for Multimodal Understanding and Generation

arXiv:2603.27693v1 Announce Type: cross
Abstract: Unified multimodal pretraining has emerged as a promising paradigm for jointly modeling language and vision within a single foundation model. However, existing approaches largely rely on implicit or in…