cs.CL

OmniThoughtVis: A Scalable Distillation Pipeline for Deployable Multimodal Reasoning Models

arXiv:2605.11629v1 Announce Type: new
Abstract: Recent multimodal large language models (MLLMs) have shown strong chain-of-thought (CoT) reasoning ability on vision-language tasks, but their direct deployment in real-world systems is often limited by …