Karan Goyal, Dikshant Kukreja

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm

Karan Goyal, Dikshant Kukreja / April 23, 2026

arXiv:2604.20665v1 Announce Type: new
Abstract: The rapid proliferation of Vision-Language Models (VLMs) is widely celebrated as the dawn of unified multimodal knowledge discovery but its foundation operates on a dangerous, unquestioned axiom: that cu…

Author name: Karan Goyal, Dikshant Kukreja

The Expense of Seeing: Attaining Trustworthy Multimodal Reasoning Within the Monolithic Paradigm