PolarVLM: Bridging the Semantic-Physical Gap in Vision-Language Models
arXiv:2605.07574v2 Announce Type: replace
Abstract: Mainstream vision-language models (VLMs) fundamentally struggle with severe optical ambiguities, such as reflections and transparent objects, due to the inherent limitations of standard RGB inputs. W…