AI safety via debate
We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.
We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.
We’re releasing an experimental metalearning approach called Evolved Policy Gradients, a method that evolves the loss function of learning agents, which can enable fast training on novel tasks. Agents trained with EPG can succeed at basic tasks at test…
UPDATE: Unfortunately my Pull-Request to Keras that changed the behaviour of the Batch Normalization layer was not accepted. You can read the details here. For those of you who are brave enough to mess with custom implementations, you can find the code…
[Updated on 2018-06-30: add two new policy gradient methods, SAC and D4PG.]
[Updated on 2018-09-30: add a new policy gradient method, TD3.]
[Updated on 2019-02-09: add SAC with automatically adjusted temperature].
[Updated on 2019-06-26: Thanks to …
We’re launching a transfer learning contest that measures a reinforcement learning algorithm’s ability to generalize from previous experience.
<!–
–>
<!–
–>
<!–
–>
<!–Evolved Biped Walker.
–>
Can agents learn inside of their own dreams?
<!––>
GitHub
Redirecting to worldmodels.github.io, where the article resides.
On March 3rd, we hosted our first hackathon with 100 members of the artificial intelligence community.