Progressive Multimodal Interaction Network for Reliable Quantification of Fish Feeding Intensity in Aquaculture
arXiv:2506.14170v3 Announce Type: replace-cross
Abstract: Accurate quantification of fish feeding intensity is crucial for precision feeding in aquaculture, as it directly affects feed utilization and farming efficiency. Although multimodal fusion has proven to be an effective solution, existing methods often overlook the inconsistencies in responses and decision conflicts between different modalities, thus limiting the reliability of the quantification results. To address this issue, this paper proposes a Progressive Multimodal Interaction Network (PMIN) that integrates image, audio, and water-wave data for fish feeding intensity quantification. Specifically, a unified feature extraction framework is first constructed to map inputs from different modalities into a structurally consistent feature space, thereby reducing representational discrepancies across modalities. Then, an auxiliary-modality reinforcement primary-modality mechanism is designed to facilitate the fusion of cross-modal information, which is achieved through channel aware recalibration and dual-stage attention interaction. Furthermore, a decision fusion strategy based on adaptive evidence reasoning is introduced to jointly model the confidence, reliability, and conflicts of modality-specific outputs, so as to improve the stability and robustness of the final judgment. Experiments are conducted on a multimodal fish feeding intensity dataset containing 7089 samples. The results show that PMIN has an accuracy of 96.76%, while maintaining relatively low parameter count and computational cost, and its overall performance outperforms both homogeneous and heterogeneous comparison models. Ablation studies, comparative experiments, and real-world application results further validate the effectiveness and superiority of the proposed method. It can provide reliable support for automated feeding monitoring and precise feeding decisions in smart aquaculture.