On Nets, Trees, and Self-Improvement

by Thomas Zheng, 2016

Earlier this month, the world was abuzz with astonishment when a computer program called Alphago won the first game against a 7-time world champion, a master Go prodigy, Lee Sedol, in a best of five series match.


The first end game (Black resigned).

A week later, AlphaGo won the match 4 to 1. It must have been a humiliating defeat for Lee Sedol as he had also predicted a 4 to 1 outcome, but with a different victor in mind. At the conclusion of the match, he uttered: “I don’t feel this was a loss for human beings. It showed my weaknesses, not the weaknesses of humanity.” It seems that hubris and humility have both remained unchallenged human capabilities.

While a lot of people are ambivalent about this outcome and its implication for humanity, many are more intrigued by the technology and engineering behind AlphaGo. I, for one, think we can learn a lot from taking a closer look at the three building blocks of AlphaGo.


Neural nets were invented decades ago. It could do rather interesting things like classifying numerical digits. It had created more excitement than actual solutions, so it eventually receded to its own little pocket of influence, aka academic conferences.

Fast forward a few decades, neural nets are experiencing a kind of renaissance of late. As it turns out, neural nets can do rather amazing things if there are enough data to feed it. The most recent manifestation called the deep neural nets have been making all the headlines. Since 2012, some manifestation of a deep neural network has consistently won the top spot in the ImageNet competition. So what is the magic? Well, it is really a confluence of two factors, the first of which is a direct consequence of the Moore’s law. Processor design and architecture are both immensely interesting topics, but will be distraction for our discussion here.


The second crucial factor is what I call gratuitous big data that is actually pretty costly to obtain. ImageNet is a non-profit organization which has served as a data compendium for images. Every year, they organize what they call a Large Scale Visual Recognition Challenge (ILSVR), in which they provide image data to teams around the world to compete. However, not all images are created equal from a computer point of view; only those which can be accurately labeled by a human is of value for the computers as far as classification is concerned. This is where things get interesting: in order for the deep neural nets or any machine learning methods to classify images as well as a human can, the neural nets need many millions of hand-labeled images.

For AlphaGo, it is no different: it needs quality labeled data. Fortunately, there are about 50 million recorded Go games which are available for AlphaGo to churn through. On the other hand, human players can perhaps play about thirty thousand games at the maximum, assuming one is incredibly gifted and plays one Go game everyday for his entire life. In other words, AlphaGo is not very efficient with the data at its disposal. That is one of the critiques levied at neural nets. That is not to mention the number of man-hours it took to prepare the data from 50 million games. I would go so far as to say:

Neural nets are a high-maintenance slow learner.


If you were a computer scientist, you would probably be fascinated with trees because it is one of the most powerful data structures ever invented by men. DeepBlue was the first computer program that used the tree to analyze the state space of a chess board and search for an optimal move. This relatively simple data structure managed to beat the then Chess master, Gary Kasparov, some two decades ago. A lot has improved since then. These days, the tree used to play the game of Go has a closer resemblance to a real tree in the sense that the branches are much more numerous and each branch can have different number of sub-branches (just like the picture below).

A million branches and one root

This latest tree-based search method is called Monte-Carlo Tree Search (MCTS). As the name suggests, it is a simulation based method. It is interesting to talk about how this tree grows its branches via simulation. In computational game theory, there is this concept called regret. As the word implies, it takes into account the cost of not trying other options. Intuitively, one wants to minimize regret when faced with multiple options, that is, pick the option that will yield the minimum regret. You might ask: wouldn’t that be nice if you could do it in real life? In computer games, that is an almost trivial task because the computer can simulate a billion games in seconds. Yes, it would be nice to have a Monte-Carlo simulator for making important real life decisions. From this, one may safely conclude:

Life is unfair outside of computer games.


Last but not least, in order to win any games, AlphaGo needs to be able to improve its capacity as a Go player. How does it do that? AlphaGo can get online and play real life Go games on the internet, but that would literally take a life time to improve its game because it will at best be learning at the human speed. Fortunately, AlphaGo is a program and can be made to play a Go game against itself, or a slight different version of itself.


This method of self-play is an old idea. More than three decades ago, Gerald Tesauro from IBM built a computer Backgammon player, called TD-Gammon. Nobody had thought it could achieve anything more than an intermediate-level player. As it turned out, after playing against itself for over a million Backgammon games and improved on it after each game, it became just about as good as the best human Backgammon players.

AlphaGo took this idea and scaled it a million times. After all, Moore’s law for the past four or five decades have ensured that everything will get faster and cheaper every 18 months, though the economics of Moore’s law is coming to an end. Faster self-playing aside, this strategy of improving game playing programs is another peculiarity of building strong computer game players. Any computer game is a simulation which can be re-programmed to play against oneself. One more observation for the day:

Don’t waste time playing computer games.

Where is it heading?

In a way, today’s AlphaGo is like the Aristotle from millenniums ago. He might have been the smartest man on the planet, but he was really useless on his own and he needed an army of servants to do his bidding. In the end, the best thing we got out of his wisdom was propositional logic. While we don’t yet know what the best thing we could get out of the AlphaGo at this point, we already know there would be no shortage of people to do his bidding. Beating a master Go player by combining the three powerful AI paradigms: nets, trees, and self-improvement, is a herald of something big and unpredictable in AI. But it seems the economics of AI is still in its infancy.

This article originally published by Thomas Zheng on Medium, available under a Creative Commons Attribution-Noncommercial license. © 2016