cs.CL

Where Vision Becomes Text: Locating the OCR Routing Bottleneck in Vision-Language Models

arXiv:2602.22918v2 Announce Type: replace
Abstract: Vision-language models (VLMs) can read text from images, but where does this optical character recognition (OCR) information enter the language processing stream? We investigate the OCR routing mecha…