Yang Xu, Jiefu Zhang, Haixiang Sun, Zihan Zhou, Tianyu Cao, Vaneet Aggarwal

Towards Reliable LLM Evaluation: Correcting the Winner’s Curse in Adaptive Benchmarking

Yang Xu, Jiefu Zhang, Haixiang Sun, Zihan Zhou, Tianyu Cao, Vaneet Aggarwal / May 8, 2026

arXiv:2605.05973v1 Announce Type: new
Abstract: Adaptive prompt and program search makes LLM evaluation selection-sensitive. Once benchmark items are reused inside tuning, the observed winner’s score need not estimate the fresh-data performance of the…

Author name: Yang Xu, Jiefu Zhang, Haixiang Sun, Zihan Zhou, Tianyu Cao, Vaneet Aggarwal

Towards Reliable LLM Evaluation: Correcting the Winner’s Curse in Adaptive Benchmarking