C2F-Thinker: Coarse-to-Fine Reasoning with Hint-Guided Reinforcement Learning for Multimodal Sentiment Analysis

arXiv:2604.00013v2 Announce Type: replace Abstract: Multimodal sentiment analysis aims to integrate textual, acoustic, and visual information for deep emotional understanding. Despite the progress of multimodal large language models (MLLMs) via supervised fine-tuning, their "black-box" nature hinders interpretability. While Chain-of-Thought (CoT) reasoning offers a potential remedy, it is constrained by high manual annotation costs and the inherent challenges of reinforcement learning (RL), such as reward sparsity and low exploration efficiency on hard samples. This paper presents C2F-Thinker, a framework that harmonizes coarse-to-fine structured reasoning with hint-guided RL through a two-stage progressive training pipeline. In the first stage, we conduct cold-start supervised fine-tuning using high-quality CoT data distilled from a larger teacher model, consisting of three distinct phases: polarity judgment, intermediate analysis, and fine-grained scoring. This equips the base model with a structured emotional reasoning paradigm. In the second stage, we introduce a hint-guided Group Relative Policy Optimization (GRPO) algorithm. By injecting correct initial polarity predictions as hints during the sampling process, the model is guided toward accurate reasoning paths, effectively mitigating cascading errors and enhancing the utilization of hard samples. Furthermore, a multi-faceted reward function incorporating classification, regression, and formatting constraints is designed to refine prediction accuracy while preserving interpretability. Experimental results demonstrate that C2F-Thinker achieves competitive performance on fine-grained sentiment regression tasks while significantly outperforming baselines in cross-domain generalization. This highlights its potential in building trustworthy and robust sentiment analysis systems for real-world applications.

Leave a Comment