cs.CL, cs.CR

AgentShield: Deception-based Compromise Detection for Tool-using LLM Agents

arXiv:2605.11026v1 Announce Type: cross
Abstract: Defenses against indirect prompt injection (IPI) in tool-using LLM agents share two structural weaknesses. First, they all attempt to prevent attacks rather than detect the compromises that slip throug…