Safety & Alignment

Learning complex goals with iterated amplification

OpenAI News / October 22, 2018

We’re proposing an AI safety technique called iterated amplification that lets us specify complicated behaviors and goals that are beyond human scale, by demonstrating how to decompose a task into simpler sub-tasks, rather than by providing labeled dat…

Safety & Alignment

Improving language understanding with unsupervised learning

OpenAI News / June 11, 2018

We’ve obtained state-of-the-art results on a suite of diverse language tasks with a scalable, task-agnostic system, which we’re also releasing. Our approach is a combination of two existing ideas: transformers and unsupervised pre-training. These resul…

Safety & Alignment

AI safety via debate

OpenAI News / May 3, 2018

We’re proposing an AI safety technique which trains agents to debate topics with one another, using a human to judge who wins.

Safety & Alignment

Preparing for malicious uses of AI

OpenAI News / February 20, 2018

We’ve co-authored a paper that forecasts how malicious actors could misuse AI technology, and potential ways we can prevent and mitigate these threats. This paper is the outcome of almost a year of sustained work with our colleagues at the Future of Hu…

Safety & Alignment

Learning from human preferences

OpenAI News / June 13, 2017

One step towards building safe AI systems is to remove the need for humans to write goal functions, since using a simple proxy for a complex goal, or getting the complex goal a bit wrong, can lead to undesirable and even dangerous behavior. In collabor…

Safety & Alignment

Attacking machine learning with adversarial examples

OpenAI News / February 24, 2017

Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake; they’re like optical illusions for machines. In this post we’ll show how adversarial examples work across diffe…

Safety & Alignment

Adversarial attacks on neural network policies

OpenAI News / February 8, 2017

Safety & Alignment

Faulty reward functions in the wild

OpenAI News / December 21, 2016

Reinforcement learning algorithms can break in surprising, counterintuitive ways. In this post we’ll explore one failure mode, which is where you misspecify your reward function.

Safety & Alignment

Semi-supervised knowledge transfer for deep learning from private training data

OpenAI News / October 18, 2016

Safety & Alignment

Concrete AI safety problems

OpenAI News / June 21, 2016

We (along with researchers from Berkeley and Stanford) are co-authors on today’s paper led by Google Brain researchers, Concrete Problems in AI Safety. The paper explores many research problems around ensuring that modern machine learning systems opera…