Benchmarked Yet Not Measured — Generative AI Should be Evaluated Against Real-World Utility
arXiv:2605.06856v2 Announce Type: replace-cross
Abstract: Generative AI systems achieve impressive performance on standard benchmarks yet fail to deliver real-world utility, a disconnect we identify across 28 deployment cases spanning education, healt…