Google's DeepMind just published both its recent results and
the code it uses to get those results.
DeepMind uses the new "deep learning" neural net approach to machine learning. The current paper reports that the system is able to learn to play a range of traditional Atari games from scratch, all using the same untuned neural net. The system gets nothing more than the pixels on the screen as input. Its outputs control the buttons/keys it presses to play the game. It is also given a continually updated score that indicates how well it is doing.
That's all pretty impressive -- although the results, that it can play Atari games, was announced a while ago. This video (http://youtu.be/xN1d3qHMIEQ
) includes interviews with some of the DeepMind developers.
One of the questions was particularly revealing. What does the system have a hard time learning? The answer was that it has a hard time developing strategies with no immediate feedback.
With most of the Atari games, the system is told whenever it does something positive. It can build on those successes. But if winning a game involves strategic planning, e.g., traversing a maze, the system has no way to tell when it is making progress. That's one reason, for example, the system does not do well on Ms. PacMan.
Quite a few years ago I worked on genetic programming. Genetic programming made significant gains initially, but it never achieved its goal of learning how to write code with any level of skill. My analysis was that genetic programming, like genetic algorithms, succeeds when there are paths to the solution. If there are no paths to the solution, it's very difficult for an evolutionary system to find its way. That seems to be the same problem DeepMind is having. If there are no markers along the trail saying "you are doing well," the system has a hard time finding its way.
That doesn't mean that every little step must be marked as right or wrong. With enough processing speed and memory, a system can explore many potential paths. All it needs is a way to evaluate how good each partial path is. It doesn't need to know whether each step is a good step.
In many cases such systems can be even more sophisticated in that they can find multiple useful partial paths that on their own don't lead to a solution but that together do. That's one of the strengths of genetic algorithms. Different elements of the population may have discovered different useful partial strategies, which when combined yield significant improvement.
Nonetheless, it's significant that DeepMind seems to be encountering the same sort of problem that blocked earlier learning algorithms.
This is not to say that genetic programming has failed. There are significant genetic programming successes. (See, for examp;le, the "Humies" awards: http://goo.gl/O0Ilxh
.) It's just that none of them involve writing software of any level of sophistication.