Brevity Constraints Reverse Performance Hierarchies in Language Models
arXiv:2604.00025v1 Announce Type: new
Abstract: Standard evaluation protocols reveal a counterintuitive phenomenon: on 7.7% of benchmark problems spanning five datasets, larger language models underperform smaller ones by 28.4 percentage points despit…