Pearmut: Human Evaluation of Translation Made Trivial
arXiv:2601.02933v3 Announce Type: replace
Abstract: Human evaluation is the gold standard for multilingual NLP, but is often skipped in practice and substituted with automatic metrics because it is notoriously complex and slow to set up with existing …