cs.AI

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

arXiv:2603.10384v2 Announce Type: replace
Abstract: Evaluating LLM reliability via scalar probabilities often fails to capture the structural dynamics of reasoning. We introduce TRACED, a framework that assesses reasoning quality through theoretically…