cs.CV

A Systematic Post-Train Framework for Video Generation

arXiv:2604.25427v1 Announce Type: new
Abstract: While large-scale video diffusion models have demonstrated impressive capabilities in generating high-resolution and semantically rich content, a significant gap remains between their pretraining perform…

cs.CV

OneThinker: All-in-one Reasoning Model for Image and Video

arXiv:2512.03043v3 Announce Type: replace
Abstract: Reinforcement learning (RL) has recently achieved remarkable success in eliciting visual reasoning within Multimodal Large Language Models (MLLMs). However, existing approaches typically train separa…

cs.CV

AdaTooler-V: Adaptive Tool-Use for Images and Videos

arXiv:2512.16918v3 Announce Type: replace
Abstract: Recent advances have shown that multimodal large language models (MLLMs) benefit from multimodal interleaved chain-of-thought (CoT) with vision tool interactions. However, existing open-source models…

Scroll to Top