cs.AI, cs.CL, cs.IR

Document-as-Image Representations Fall Short for Scientific Retrieval

arXiv:2604.18508v1 Announce Type: cross
Abstract: Many recent document embedding models are trained on document-as-image representations, embedding rendered pages as images rather than the underlying source. Meanwhile, existing benchmarks for scientif…