cs.AI, cs.CL, cs.CV

Allegory of the Cave: Measurement-Grounded Vision-Language Learning

arXiv:2605.11727v1 Announce Type: cross
Abstract: Vision-language models typically reason over post-ISP RGB images, although RGB rendering can clip, suppress, or quantize sensor evidence before inference. We study whether grounding improves when the v…