Uncategorised

Reinforcement Learning, Agency and Taste

This started off as an entry for Dwarkesh’s blog post contest, specifically an answer to his first question on why intuitions about slowdowns in reinforcement learning (RL) progress have either not come true or have had mixed success. His 1000 word lim…