cs.AI, cs.CY, cs.LG, stat.AP

Knowledge without Wisdom: Measuring Misalignment between LLMs and Intended Impact

arXiv:2603.00883v2 Announce Type: replace
Abstract: LLMs increasingly excel on AI benchmarks, but doing so does not guarantee validity for downstream tasks. This study contrasts LLM alignment on benchmarks, downstream tasks, and, importantly the inten…