Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris

Boosting Visual Instruction Tuning with Self-Supervised Guidance

Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris / April 15, 2026

arXiv:2604.12966v1 Announce Type: new
Abstract: Multimodal large language models (MLLMs) perform well on many vision-language tasks but often struggle with vision-centric problems that require fine-grained visual reasoning. Recent evidence suggests th…

Author name: Sophia Sirko-Galouchenko, Monika Wysoczanska, Andrei Bursuc, Nicolas Thome, Spyros Gidaris

Boosting Visual Instruction Tuning with Self-Supervised Guidance