cs.CL, cs.CV, cs.LG

Optical Context Compression Is Just (Bad) Autoencoding

arXiv:2512.03643v2 Announce Type: replace
Abstract: DeepSeek-OCR shows that rendered text can be reconstructed from a small number of vision tokens, sparking excitement about using vision as a compression medium for long textual contexts. But this pip…