DeepSketcher: Internalizing Visual Manipulation for Multimodal Reasoning
arXiv:2509.25866v2 Announce Type: replace
Abstract: The “thinking with images” paradigm represents a pivotal shift in the reasoning of Vision Language Models (VLMs), moving from text-dominant chain-of-thought to image-interactive reasoning. By invokin…