Detecting Corporate AI-Washing via Cross-Modal Semantic Inconsistency Learning
arXiv:2604.09644v1 Announce Type: cross
Abstract: Corporate AI-washing-the strategic misrepresentation of AI capabilities via exaggerated or fabricated cross-channel disclosures-has emerged as a systemic threat to capital market information integrity with the widespread adoption of generative AI. Existing detection methods rely on single-modal text frequency analysis, suffering from vulnerability to adversarial reformulation and cross-channel obfuscation. This paper presents AWASH, a multimodal framework that redefines AI-washing detection as cross-modal claim-evidence reasoning (instead of surface-level similarity measurement), built on AW-Bench-the first large-scale trimodal benchmark for this task, including 88412 aligned annual report text, disclosure image, and earnings call video triplets from 4892 A-share listed firms during 2019Q1-2025Q2. We propose the Cross-Modal Inconsistency Detection (CMID) network, integrating a tri-modal encoder, a structured natural language inference module for claim-evidence entailment reasoning, and an operational grounding layer that cross-validates AI claims against verifiable physical evidence (patent filing trajectories, AI-specific talent recruitment, compute infrastructure proxies). Evaluated against six competitive baselines, CMID achieves an F1 score of 0.882 and an AUC-ROC of 0.921, outperforming the strongest text-only baseline by 17.4 percentage points and the latest multimodal competitor by 11.3 percentage points. A pre-registered user study with 14 regulatory analysts verifies that CMID-generated evidence reports cut case review time by 43% while increasing true positive detection rates by 28%. These findings confirm the technical superiority and practical applicability of structured multimodal reasoning for large-scale corporate disclosure surveillance.