Data Organization Matters in Multimodal Instruction Tuning: A Controlled Study of Capability Trade-offs
arXiv:2603.27744v1 Announce Type: new
Abstract: Recent multimodal large language models (MLLMs) perform strongly on general visual understanding, diagram and chart reasoning, and document-centric perception. However, these abilities are learned from h…