Jikai Jin, Vasilis Syrgkanis

The Partial Testimony of Logs: Evaluation of Language Model Generation under Confounded Model Choice

Jikai Jin, Vasilis Syrgkanis / May 5, 2026

arXiv:2605.01311v1 Announce Type: new
Abstract: Offline evaluation of language models from usage logs is biased when model choice is confounded: the same user-side factors that influence which model is used can also influence how its output is judged,…

Author name: Jikai Jin, Vasilis Syrgkanis

The Partial Testimony of Logs: Evaluation of Language Model Generation under Confounded Model Choice