Who Watches the Watchmen? Humans Disagree With Translation Metrics on Unseen Domains
arXiv:2604.17393v2 Announce Type: replace
Abstract: Automatic evaluation metrics are central to the development of machine translation systems, yet their robustness under domain shift remains unclear. Most metrics are developed on the Workshop on Mach…