Multimodal Latent Reasoning via Hierarchical Visual Cues Injection
arXiv:2602.05359v2 Announce Type: replace
Abstract: The advancement of multimodal large language models (MLLMs) has enabled impressive perception capabilities. However, their reasoning process often remains a “fast thinking” paradigm, reliant on end-t…