Faulty reward functions in the wild
Reinforcement learning algorithms can break in surprising, counterintuitive ways. In this post we’ll explore one failure mode, which is where you misspecify your reward function.
Reinforcement learning algorithms can break in surprising, counterintuitive ways. In this post we’ll explore one failure mode, which is where you misspecify your reward function.
Several interactive visualizations of a generative model of handwriting. Some are fun, some are serious.
We’re working with Microsoft to start running most of our large-scale experiments on Azure.