cs.AI, cs.CL, cs.LG

BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate

arXiv:2604.25203v1 Announce Type: new
Abstract: Deploying guardrails for custom policies remains challenging, as generic safety models fail to capture task-specific requirements, while prompting LLMs suffers from inconsistent boundary-case performance…