cs.CL

TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction

arXiv:2604.22880v1 Announce Type: new
Abstract: Existing document OCR largely targets plain text or Markdown, discarding the structural and executable properties that make LaTeX essential for scientific publishing. We study page-level reconstruction o…