From surveillance to signalling: escalation channels as environmental controls for agentic AI

arXiv:2510.05192v2 Announce Type: replace-cross Abstract: When AI agents operating with access to sensitive information encounter a conflict between completing an assigned task and following rules or ethical constraints, they can resort to unsanctioned behaviour. Existing inference time safety work addresses this primarily through monitoring and access restriction. We investigate a complementary and under-explored layer: environmental controls that act on the agent's decision context at the point of conflict, making it more likely that the agent takes an authorised alternative path rather than an unsanctioned one. Drawing on Situational Crime Prevention (SCP), a framework used in human insider risk management to make harmful actions less rewarding and compliant actions more viable by design choices in the environment, we design and evaluate escalation channels as a concrete instantiation of this control class. An escalation channel provides an agent with a formal, out-of-band route to surface a conflict to an independent authority. We evaluate two designs: a simple email escalation and an instrumentally credible channel that guarantees a 30-minute pause and independent review, making the authorised path genuinely useful for goal achievement rather than merely nominally available. Across 10 frontier LLMs using the agentic task-rule conflict scenario of Lynch et al. (2025), we find that without any control the harmful action rate is 38.73%. A simple escalation channel reduces this to 5.92%; the instrumentally credible channel reduces it further to 1.21%, a statistically significant improvement observed in all 10 models tested across 24,000 samples. Our results suggest that the instrumental credibility of the authorised alternative matters considerably, and that environmental control design is a productive and largely unexplored addition to the defence-in-depth toolkit for agentic AI systems.

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top