At OpenAI, Mr. Amodei and his colleague Paul Christiano are developing algorithms that can not only learn tasks through hours of trial and error, but also receive regular guidance from human teachers along the way. With a few clicks here and there, the researchers now have a way of showing the autonomous system that it needs to win points in Coast Runners while also moving toward the finish line. They believe that these kinds of algorithms — a blend of human and machine instruction — can help keep automated systems safe.
But Mr. Goodfellow and others have shown that hackers can alter images so that a neural network will believe they include things that aren’t really there. Just by changing a few pixels in the photo of elephant, for example, they could fool the neural network into thinking it depicts a car. That becomes problematic when neural networks are used in security cameras. Simply by making a few marks on your face, the researchers said, you could fool a camera into believing you’re someone else.