SCoOP: Semantic Consistent Opinion Pooling for Uncertainty Quantification in Multiple Vision-Language Model Systems
arXiv:2603.23853v2 Announce Type: replace
Abstract: Combining multiple Vision-Language Models (VLMs) can enhance multimodal reasoning and robustness, but aggregating heterogeneous models’ outputs amplifies uncertainty and increases the risk of halluci…