Breaking the Illusion: Consensus-Based Generative Mitigation of Adversarial Illusions in Multi-Modal Embeddings

arXiv:2511.21893v2 Announce Type: replace Abstract: Multi-modal foundation models align images, text, and other modalities in a shared embedding space but remain vulnerable to adversarial illusions [35], where imperceptible perturbations disrupt cross-modal alignment and mislead downstream tasks. To counteract the effects of adversarial illusions, we propose a task-agnostic mitigation mechanism that purifies the attacker's perturbed input using generative models, e.g., Variational Autoencoders (VAEs), to restore natural alignment. To further enhance the defense mechanism, we adopt a generative sampling strategy combined with a consensus-based aggregation scheme over the outcomes of the generated samples. Our experiments on ImageBind, a state-of-the-art multi-modal encoder, show that our approach substantially reduces the illusion attack success rates to near-zero and improves cross-modal alignment in unperturbed and perturbed input settings, providing an effective and task-agnostic defense against adversarial illusions. The code is available at https://github.com/fatemehakb/adversarial-illusions-mitigation.

Leave a Comment