Theodore J Kalaitzidis

The Evaluation Trap: Benchmark Design as Theoretical Commitment

Theodore J Kalaitzidis / May 15, 2026

arXiv:2605.14167v1 Announce Type: new
Abstract: Every AI benchmark operationalizes theoretical assumptions about the capability it claims to assess. When assumptions function as unexamined commitments, benchmarks stabilize the dominant paradigm by nar…

Author name: Theodore J Kalaitzidis

The Evaluation Trap: Benchmark Design as Theoretical Commitment