Aaron Rose, Carissa Cullen, Brandon Gary Kaplowitz, Christian Schroeder de Witt

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

Aaron Rose, Carissa Cullen, Brandon Gary Kaplowitz, Christian Schroeder de Witt / April 2, 2026

arXiv:2604.01151v1 Announce Type: new
Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have…

Author name: Aaron Rose, Carissa Cullen, Brandon Gary Kaplowitz, Christian Schroeder de Witt

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability