Google’s DeepMind: Advance in AI

Google’s DeepMind: Advance in AI

Acquired by Google in 2014, DeepMind is a British artificial intelligence company founded in September 2010. The company has created a neural network that learns how to play video games in a fashion similar to that of humans, as well as a Neural turing machine, a network that may be able to access an external memory like a conventional turing machine, resulting in a computer that imitates the short-term memory of the human brain.

The company became famous in 2016 after its AlphaGo program beat a human professional Go player for the first time, and made headlines again after beating Lee Sedol, the world champion in a five game tournament.

Google's DeepMind has made another big advance in artificial intelligence by getting a machine to master the Chinese game of Go without help from human players. Although AlphaGo started by learning from thousands of games played by humans, the new AlphaGo Zero began with a blank Go board and no data bar the rules. After learning the rules, AlphaGo Zero played itself. Within 72 hours it was good enough to beat the original program by 100 games to zero.

DeepMind's chief executive, Demis Hassabis, said the system could now have more general applications in scientific research. We're quite excited because we think this is now good enough to make some real progress on some real problems even though we're obviously a long way from full AI, he said.

The software defeated leading South Korean Go player Lee Se-don by four games to one last year in a game where there are more possible legal board positions than there are atoms in the universe. AlphaGo also defeated world's number one Go player, China's Ke Jie.

Go is an abstract strategy board game for two players, in which the goal is to surround more territory than the opponent. The game was invented in China over 2,500 years ago, and thus, it's believed to be the oldest board game that is still played today. The rules are simpler than those of chess and the player usually has a choice of 200 moves throughout the game, compared with about 20 in chess. Top human players usually rely on instinct to win.

The achievements of AlphaGo required the combination of vast amounts of data - records of thousands of games - and a vast computer-processing power.

David Silver, lead researched on AlphaGo, said the team took a different approach with AlphaGo Zero. "The new version starts from a neural network that knows nothing at all about the game of Go," he explained. "The only knowledge it has is the rules of the game. Apart from that, it figures everything out just by playing games against itself."

While AlphaGo took months to get to the point where it could take on a professional, AlphaGo Zero got there in just three days, and only using a fraction of the processing power.

"It shows it's the novel algorithms that count, not the computer power or the data," says Mr Silver.

He highlighted an idea that some may find scary: in just a few days a machine has surpassed the knowledge of this game acquired by humanity over thousands of years.

"We've actually removed the constraints of human knowledge and it's able, therefore, to create knowledge itself from first principles, from a blank slate," he said.

While AlphaGo learned from and improved upon human strategies, AlphaGo Zero devised techniques which the professional player who advised DeepMind admitted had never seen before. It is able to do this by using a novel form of reinforcement learning, in which AlphaGo Zero becomes its own teacher. The system starts off with a neural network that knows nothing about the game of Go. It then plays games against itself, by combining this neural network with a powerful search algorithm. As it plays, the neural network is tuned and updated to predict moves, as well as the eventual winner of the games.

This updated neural network is then recombined with the search algorithm to create a new, stronger version of AlphaGo Zero, and the process begins again. In each iteration, the performance of the system improves by a small amount, and the quality of the self-play games increases, leading to more and more accurate neural networks and ever stronger versions of AlphaGo Zero.

This technique is more powerful than previous versions of AlphaGo because it is no longer constrained by the limits of human knowledge. Instead, it is able to learn tabula rasa from the strongest player in the world: AlphaGo itself.

Many of the research team have now moved on to new projects where they want to apply the same software to new areas. Demis Hassabis stated that some areas of interest include drug design and the discovery of new materials.

Some might see AI as a threat, but Hassabis looks into the future with optimism. I hope these kind of algorithms will be routinely working with us as scientific experts medical experts on advancing the frontiers of science and medicine - that's what I hope," he says.

Nonetheless, he and his colleagues are aware of the dangers of applying AI techniques to the real world at a fast pace. A game with clear rules and no element of luck is one thing, but the random real world is another.