cs.CL, cs.CV

SmoGVLM: A Small, Graph-enhanced Vision-Language Model

arXiv:2604.16517v1 Announce Type: new
Abstract: Large vision-language models (VLMs) achieve strong performance on multimodal tasks but often suffer from hallucination and poor grounding in knowledge-intensive reasoning. We propose SmoGVLM, a small, gr…