cs.AI, cs.CL

GAIA-v2-LILT: Multilingual Adaptation of Agent Benchmark beyond Translation

arXiv:2604.24929v1 Announce Type: new
Abstract: Agent benchmarks remain largely English-centric, while their multilingual versions are often built with machine translation (MT) and limited post-editing. We argue that, for agentic tasks, this minimal w…