Logan Riggs - Provide.ai

Uncategorised

Consent-Based RL: Letting Models Endorse Their Own Training Updates

Logan Riggs / April 17, 2026

AKA scalable oversight of value driftTL;DR LLMs could be aligned but then corrupted through RL, instrumentally converging on deep consequentialism. If LLMs are sufficiently aligned and can properly oversee their training updates, we they can prevent th…

Author name: Logan Riggs

Consent-Based RL: Letting Models Endorse Their Own Training Updates