cs.AI, cs.CL, cs.HC, cs.MA

Litmus (Re)Agent: A Benchmark and Agentic System for Predictive Evaluation of Multilingual Models

arXiv:2604.08970v1 Announce Type: new
Abstract: We study predictive multilingual evaluation: estimating how well a model will perform on a task in a target language when direct benchmark results are missing. This problem is common in multilingual depl…