MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents
arXiv:2605.06334v1 Announce Type: new
Abstract: Tool-using large language model (LLM) agents are increasingly deployed in settings where their reliable behavior is governed by strict procedural manuals. Ensuring that such agents comply with the rules …