Optical Context Compression Is Just (Bad) Autoencoding
arXiv:2512.03643v2 Announce Type: replace
Abstract: DeepSeek-OCR shows that rendered text can be reconstructed from a small number of vision tokens, sparking excitement about using vision as a compression medium for long textual contexts. But this pip…