Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss
arXiv:2604.12911v1 Announce Type: cross
Abstract: Multilingual benchmarks guide the development of frontier models. Yet multilingual evaluations reported by frontier models are structured similar to popular reasoning and knowledge benchmarks, but acro…