SmoGVLM: A Small, Graph-enhanced Vision-Language Model
arXiv:2604.16517v1 Announce Type: new
Abstract: Large vision-language models (VLMs) achieve strong performance on multimodal tasks but often suffer from hallucination and poor grounding in knowledge-intensive reasoning. We propose SmoGVLM, a small, gr…