Animesh Jain, Alexandros Stergiou

MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization

Animesh Jain, Alexandros Stergiou / April 8, 2026

arXiv:2508.07833v3 Announce Type: replace
Abstract: Vision Language Models (VLMs) encode multimodal inputs over large, complex, and difficult-to-interpret architectures, which limit transparency and trust. We propose a Multimodal Inversion for Model I…

Author name: Animesh Jain, Alexandros Stergiou

MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization