cs.CL

ForMaT: Dataset for Visually-Grounded Multilingual PDF Translation

arXiv:2605.15794v1 Announce Type: new
Abstract: We present ForMaT (Format-Preserving Multilingual Translation), a parallel corpus of 3,956 PDFs across 15 language pairs that preserves original layout metadata proposed for multimodal machine translatio…