Hybrid Latent Reasoning with Decoupled Policy Optimization
arXiv:2604.20328v1 Announce Type: new
Abstract: Chain-of-Thought (CoT) reasoning significantly elevates the complex problem-solving capabilities of multimodal large language models (MLLMs). However, adapting CoT to vision typically discretizes signals…