Bridging Pixels and Words: Mask-Aware Local Semantic Fusion for Multimodal Media Verification
arXiv:2603.26052v1 Announce Type: new
Abstract: As multimodal misinformation becomes more sophisticated, its detection and grounding are crucial. However, current multimodal verification methods, relying on passive holistic fusion, struggle with sophi…