cs.AI

Failure-Centered Runtime Evaluation for Deployed Trilingual Public-Space Agents

arXiv:2604.23990v1 Announce Type: new
Abstract: This paper presents PSA-Eval, a failure-centered runtime evaluation framework for deployed trilingual public-space agents. The central claim is that, when the evaluation object shifts from a static input…