Constitutional AI vs. RLHF vs. Deliberative Alignment
Outline:Quick review of RLHF, Constitutional AI, and Deliberative Alignment for a somewhat-technical audience, literature review of historical failure modes.Introduce “Persona-Emotion-Behavior space”- combining two recent interpretability papers to get…