XQ-MEval: A Dataset with Cross-lingual Parallel Quality for Benchmarking Translation Metrics
arXiv:2604.14934v2 Announce Type: replace
Abstract: Automatic evaluation metrics are essential for building multilingual translation systems. The common practice of evaluating these systems is averaging metric scores across languages, yet this is susp…