cs.AI

MEDLEY-BENCH: Scale Buys Evaluation but Not Control in AI Metacognition

arXiv:2604.16009v1 Announce Type: new
Abstract: Metacognition, the ability to monitor and regulate one’s own reasoning, remains under-evaluated in AI benchmarking. We introduce MEDLEY-BENCH, a benchmark of behavioural metacognition that separates inde…