Policy Gradient Algorithms
[Updated on 2018-06-30: add two new policy gradient methods, SAC and D4PG.]
[Updated on 2018-09-30: add a new policy gradient method, TD3.]
[Updated on 2019-02-09: add SAC with automatically adjusted temperature].
[Updated on 2019-06-26: Thanks to …
Retro Contest
We’re launching a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience.
World Models
<!–
–>
<!–
–>
<!–
–>
<!–Evolved Biped Walker.
–>
Can agents learn inside of their own dreams?
<!––>
GitHub
Redirecting to worldmodels.github.io, where the article resides.
Report from the OpenAI hackathon
On March 3rd, we hosted our first hackathon with 100 members of the artificial intelligence community.
Reptile: A scalable meta-learning algorithm
We’ve developed a simple meta-learning algorithm called Reptile which works by repeatedly sampling a task, performing stochastic gradient descent on it, and updating the initial parameters towards the final parameters learned on that task. Reptile is t…
The Building Blocks of Interpretability
Interpretability techniques are normally studied in isolation. We explore the powerful interfaces that arise when you combine them — and the rich structure of this combinatorial space.