cs.CL

Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs

arXiv:2604.18203v1 Announce Type: new
Abstract: Multimodal LLMs can accurately perceive numerical content across modalities yet fail to perform exact multi-digit multiplication when the identical underlying arithmetic problem is presented as numerals,…