Allegory of the Cave: Measurement-Grounded Vision-Language Learning
arXiv:2605.11727v1 Announce Type: cross
Abstract: Vision-language models typically reason over post-ISP RGB images, although RGB rendering can clip, suppress, or quantize sensor evidence before inference. We study whether grounding improves when the v…