cs.AI, cs.CL

Round-Trip Translation Reveals What Frontier Multilingual Benchmarks Miss

arXiv:2604.12911v1 Announce Type: cross
Abstract: Multilingual benchmarks guide the development of frontier models. Yet multilingual evaluations reported by frontier models are structured similar to popular reasoning and knowledge benchmarks, but acro…