cs.CV, cs.LG

LanteRn: Latent Visual Structured Reasoning

arXiv:2603.25629v1 Announce Type: new
Abstract: While language reasoning models excel in many tasks, visual reasoning remains challenging for current large multimodal models (LMMs). As a result, most LMMs default to verbalizing perceptual content into…