Benchmarking Document Parsers on Mathematical Formula Extraction from PDFs
arXiv:2512.09874v2 Announce Type: replace
Abstract: Correctly parsing mathematical formulas from PDFs is critical for training large language models and building scientific knowledge bases from academic literature, yet existing benchmarks either exclu…