cs.AI

Evaluating and Understanding Scheming Propensity in LLM Agents

arXiv:2603.01608v2 Announce Type: replace
Abstract: As frontier language models are increasingly deployed as autonomous agents pursuing complex, long-term objectives, there is increased risk of scheming: agents covertly pursuing misaligned goals. Prio…