Instinct vs. Reflection: Unifying Token and Verbalized Confidence in Multimodal Large Models
arXiv:2604.17274v1 Announce Type: new
Abstract: Multimodal Large Language Models (MLLMs) have demonstrated exceptional capabilities in various perception and reasoning tasks. Despite this success, ensuring their reliability in practical deployment nec…