Uncategorised

From personas to intentions: towards a science of motivations for AI models

TLDR:Behavior-only descriptions are useful, but insufficient for aligning advanced models with high assurance.Two models can look equally aligned on ordinary prompts while being driven by very different underlying motivations; this difference may only …