cs.AI, cs.LG, cs.MA

Detecting Multi-Agent Collusion Through Multi-Agent Interpretability

arXiv:2604.01151v1 Announce Type: new
Abstract: As LLM agents are increasingly deployed in multi-agent systems, they introduce risks of covert coordination that may evade standard forms of human oversight. While linear probes on model activations have…