Uncategorised

Policy Gradient Algorithms

[Updated on 2018-06-30: add two new policy gradient methods, SAC and D4PG.]

[Updated on 2018-09-30: add a new policy gradient method, TD3.]

[Updated on 2019-02-09: add SAC with automatically adjusted temperature].

[Updated on 2019-06-26: Thanks to …

Research

Retro Contest

We’re launching a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience.

Research

Reptile: A scalable meta-learning algorithm

We’ve developed a simple meta-learning algorithm called Reptile which works by repeatedly sampling a task, performing stochastic gradient descent on it, and updating the initial parameters towards the final parameters learned on that task. Reptile is t…

Uncategorised

The Building Blocks of Interpretability

Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them — and the rich structure of this combinatorial space.

Scroll to Top