ITIScore: An Image-to-Text-to-Image Rating Framework for the Image Captioning Ability of MLLMs
arXiv:2604.03765v2 Announce Type: replace
Abstract: Recent advances in multimodal large language models (MLLMs) have greatly improved image understanding and captioning capabilities. However, existing image captioning benchmarks typically suffer from …