Beyond Task Success: Measuring Workflow Fidelity in LLM-Based Agentic Payment Systems
arXiv:2605.06457v1 Announce Type: new
Abstract: LLM-based multi-agent systems are increasingly deployed for payment workflows, yet prevailing metrics, Task Success Rate (TSR) and Agent Handoff F1-Score (HF1), capture only final outcomes or unordered r…