Haoxiang Wang, Da Yu, Huishuai Zhang

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models

Haoxiang Wang, Da Yu, Huishuai Zhang / May 8, 2026

arXiv:2605.06213v1 Announce Type: new
Abstract: Evaluating large language models (LLMs) today rests on fixed benchmarks that apply the same set of items to any model, producing ceiling and floor effects that mask capability gaps. We argue that the mos…

Author name: Haoxiang Wang, Da Yu, Huishuai Zhang

Beyond Fixed Benchmarks and Worst-Case Attacks: Dynamic Boundary Evaluation for Language Models