AdaTooler-V: Adaptive Tool-Use for Images and Videos
arXiv:2512.16918v3 Announce Type: replace
Abstract: Recent advances have shown that multimodal large language models (MLLMs) benefit from multimodal interleaved chain-of-thought (CoT) with vision tool interactions. However, existing open-source models…