cs.AI, cs.CL

Propensity Inference: Environmental Contributors to LLM Behaviour

arXiv:2604.21098v1 Announce Type: new
Abstract: Motivated by loss of control risks from misaligned AI systems, we develop and apply methods for measuring language models’ propensity for unsanctioned behaviour. We contribute three methodological improv…