cs.CL, cs.LG, cs.LO

MANTRA: Synthesizing SMT-Validated Compliance Benchmarks for Tool-Using LLM Agents

arXiv:2605.06334v1 Announce Type: new
Abstract: Tool-using large language model (LLM) agents are increasingly deployed in settings where their reliable behavior is governed by strict procedural manuals. Ensuring that such agents comply with the rules …