cs.CV

Mario: Multimodal Graph Reasoning with Large Language Models

arXiv:2603.05181v2 Announce Type: replace
Abstract: Recent advances in large language models (LLMs) have opened new avenues for multimodal reasoning. Yet, most existing methods still rely on pretrained vision-language models (VLMs) to encode image-tex…