Profile cover photo
Profile photo
Debasish Ghosh
Programmer, blogger, author, nerd, and Seinfeld fanboy
Programmer, blogger, author, nerd, and Seinfeld fanboy


Post has shared content
Vladimir Voevodsky, 1966 - 2017

This mathematician died last week. He won the Fields Medal in 2002 for proving the Milnor conjecture in a branch of algebra known as algebraic K-theory. He continued to work on this subject until he helped prove the more general Bloch-Kato conjecture in 2010.

Proving these results — which are too technical to easily describe to nonmathematicians! — required him to develop a dream of Grothendieck: the theory of motives. Very roughly, this is a way of taking the space of solutions of a collection of polynomial equations and chopping it apart into building blocks. But the process of 'chopping up', and also these building blocks, called 'motives', are very abstract — nothing simple or obvious.

It's a bit like how a proton is made of quarks. You never actually see a quark in isolation, so you have to think very hard to realize they are there at all. But once you know this, a lot of things become clear.

This is wonderful, profound mathematics. But in the process of proving the Bloch-Kato conjecture, Voevodsky became tired of this stuff. He wanted to do something more useful... and more ambitious:

It was very difficult. In fact, it was 10 years of technical work on a topic that did not interest me during the last 5 of these 10 years. Everything was done only through willpower.

Since the autumn of 1997, I already understood that my main contribution to the theory of motives and motivic cohomology was made. Since that time I have been very conscious and actively looking for. I was looking for a topic that I would deal with after I fulfilled my obligations related to the Bloch-Kato hypothesis.

I quickly realized that if I wanted to do something really serious, then I should make the most of my accumulated knowledge and skills in mathematics. On the other hand, seeing the trends in the development of mathematics as a science, I realized that the time is coming when the proof of yet another conjecture won't have much of an effect. I realized that mathematics is on the verge of a crisis, or rather, two crises.

The first is connected with the separation of “pure” and applied mathematics. It is clear that sooner or later there will be a question about why society should pay money to people who are engaged in things that do not have any practical applications.

The second, less obvious, is connected with the complication of pure mathematics, which leads to the fact that, sooner or later, the articles will become too complicated for detailed verification and the process of accumulating undetected errors will begin. And since mathematics is a very deep science, in the sense that the results of one article usually depend on the results of many and many previous articles, this accumulation of errors for mathematics is very dangerous.

So, I decided, you need to try to do something that will help prevent these crises. For the first crisis, this meant that it was necessary to find an applied problem that required for its solution the methods of pure mathematics developed in recent years or even decades.

He looked for such a problem. He studied biology and found an interesting candidate. He worked on it very hard, but then decided he'd gone down a wrong path:

Since childhood I have been interested in natural sciences (physics, chemistry, biology), as well as in the theory of computer languages, and since 1997, I have read a lot on these topics, and even took several student and post-graduate courses. In fact, I “updated” and deepened the knowledge that had to a very large extent. All this time I was looking for that I recognized open problems that would be of interest to me and to which I could apply modern mathematics.

As a result, I chose, I now understand incorrectly, the problem of recovering the history of populations from their modern genetic composition. I took on this task for a total of about two years, and in the end, already by 2009, I realized that what I was inventing was useless. In my life, so far, it was, perhaps, the greatest scientific failure. A lot of work was invested in the project, which completely failed. Of course, there was some benefit, of course — I learned a lot of probability theory, which I knew badly, and also learned a lot about demography and demographic history.

But he bounced back! He came up with a new approach to the foundations of mathematics, and helped organize a team at the Institute of Advanced Studies at Princeton to develop it further. This approach is now called homotopy type theory or univalent foundations. It's fundamentally different from set theory. It treats the fundamental concept of equality in a brand new way! And it's designed to be done with the help of computers.

It seems he started down this new road when Carlos Simpson pointed out a serious mistake in a paper he'd written.

I think it was at this moment that I largely stopped doing what is called “curiosity-driven research” and started to think seriously about the future. I didn’t have the tools to explore the areas where curiosity was leading me and the areas that I considered to be of value and of interest and of beauty.

So I started to look into what I could do to create such tools. And it soon became clear that the only long-term solution was somehow to make it possible for me to use computers to verify my abstract, logical, and mathematical constructions. The software for doing this has been in development since the sixties. At the time, when I started to look for a practical proof assistant around 2000, I could not find any. There were several groups developing such systems, but none of them was in any way appropriate for the kind of mathematics for which I needed a system.

When I first started to explore the possibility, computer proof verification was almost a forbidden subject among mathematicians. A conversation about the need for computer proof assistants would invariably drift to Gödel’s incompleteness theorem (which has nothing to do with the actual problem) or to one or two cases of verification of already existing proofs, which were used only to demonstrate how impractical the whole idea was. Among the very few mathematicians who persisted in trying to advance the field of computer verification in mathematics during this time were Tom Hales and Carlos Simpson. Today, only a few years later, computer verification of proofs and of mathematical reasoning in general looks completely practical to many people who work on univalent foundations and homotopy type theory.

The primary challenge that needed to be addressed was that the foundations of mathematics were unprepared for the requirements of the task. Formulating mathematical reasoning in a language precise enough for a computer to follow meant using a foundational system of mathematics not as a standard of consistency to establish a few fundamental theorems, but as a tool that can be employed in ­everyday mathematical work. There were two main problems with the existing foundational systems, which made them inadequate. Firstly, existing foundations of mathematics were based on the languages of predicate logic and languages of this class are too limited. Secondly, existing foundations could not be used to directly express statements about such objects as, for example, the ones in my work on 2-theories.

Still, it is extremely difficult to accept that mathematics is in need of a completely new foundation. Even many of the people who are directly connected with the advances in homotopy type theory are struggling with this idea. There is a good reason: the existing foundations of mathematics – ZFC and category theory – have been very successful. Overcoming the appeal of category theory as a candidate for new foundations of mathematics was for me personally the most challenging.

Homotopy type theory is now a vital and exciting area of mathematics. It's far from done, and to make it live up to Voevodsky's dreams will require brand new ideas – not just incremental improvements, but actual sparks of genius.

I only met him a few times, but as far as I can tell Voevodsky was a completely unpretentious person. You can see that in the picture here.

He was also a very complex person. For example, you might not guess that he took great photos of wildlife:

You also might not guess at this side of him:

In 2006-2007 a lot of external and internal events happened to me, after which my point of view on the questions of the “supernatural” changed significantly. What happened to me during these years, perhaps, can be compared most closely to what happened to Karl Jung in 1913-14. Jung called it “confrontation with the unconscious”. I do not know what to call it, but I can describe it in a few words. Remaining more or less normal, apart from the fact that I was trying to discuss what was happening to me with people whom I should not have discussed it with, I had in a few months acquired a very considerable experience of visions, voices, periods when parts of my body did not obey me, and a lot of incredible accidents. The most intense period was in mid-April 2007 when I spent 9 days (7 of them in the Mormon capital of Salt Lake City), never falling asleep for all these days.

Almost from the very beginning, I found that many of these phenomena (voices, visions, various sensory hallucinations), I could control. So I was not scared and did not feel sick, but perceived everything as something very interesting, actively trying to interact with those “beings” in the auditorial, visual and then tactile spaces that appeared around me (by themselves or by invoking them). I must say, probably, to avoid possible speculations on this subject, that I did not use any drugs during this period, tried to eat and sleep a lot, and drank diluted white wine.

Another comment: when I say “beings”, naturally I mean what in modern terminology are called complex hallucinations. The word “beings” emphasizes that these hallucinations themselves “behaved”, possessed a memory independent of my memory, and reacted to attempts at communication. In addition, they were often perceived in concert in various sensory modalities. For example, I played several times with a (hallucinated) ball with a (hallucinated) girl — and I saw this ball, and felt it with my palm when I threw it.

Despite the fact that all this was very interesting, it was very difficult. It happened for several periods, the longest of which lasted from September 2007 to February 2008 without breaks. There were days when I could not read, and days when coordination of movements was broken to such an extent that it was difficult to walk.

I managed to get out of this state due to the fact that I forced myself to start math again. By the middle of spring 2008 I could already function more or less normally and even went to Salt Lake City to look at the places where I wandered, not knowing where I was, in the spring of 2007.

In short, he was a genius akin to Cantor or Grothendieck, at times teetering on the brink of sanity, yet gripped by an immense desire for beauty and clarity, engaging in struggles that gripped his whole soul. From the fires of this volcano, truly original ideas emerge.

This last quote, and the first few quotes, are from some interviews in Russian done by done by Roman Mikhailov, which +Mike Stay pointed out to me. I used Google Translate and polished the results a bit:

The quote about the origins of 'univalent foundations' comes from Voevodsky's nice essay here:

The photograph of Voevodsky is from +Andrej Bauer's website:

To learn about the Bloch-Kato conjecture, start here:

Or ask me! I don't understand this stuff very well, but I enjoy trying to learn things.

To learn homotopy type theory, try this great free book:
Add a comment...

Post has shared content
Easy as ABC? Not quite!

A brilliant mathematician named Shinichi Mochizuki claims to have proved the famous "abc conjecture" in number theory. That's great! There's just one problem: his proof is about 500 pages long, and almost nobody understands it, so mathematicians can't tell if it's correct.

Luckily another mathematician named Go Yamashita has just written a summary of the proof. That's great! There's just one problem: it's 294 pages long, and it looks very hard to understand.

I'm no expert on number theory, so my opinion doesn't really matter. What's hard for me to understand may be easy for an expert!

But the most disturbing feature to me is that this new paper contains many theorems whose statements are over a page long... with the proof being just "Follows from the definitions."

Of course, every true theorem follows from the definitions. But the proof usually says how.

It's common to omit detailed proofs when one is summarizing someone else's work. But even a sketchy argument would help us understand what's going on.

This is part of a strange pattern surrounding Mochizuki's work. There was a conference in Oxford in 2015 aimed at helping expert number theorists understand it. Many of them found it frustrating. Brian Conrad wrote:

I don’t understand what caused the communication barrier that made it so difficult to answer questions in the final 2 days in a more illuminating manner. Certainly many of us had not read much in the papers before the meeting, but this does not explain the communication difficulties. Every time I would finally understand (as happened several times during the week) the intent of certain analogies or vague phrases that had previously mystified me (e.g., “dismantling scheme theory”), I still couldn’t see why those analogies and vague phrases were considered to be illuminating as written without being supplemented by more elaboration on the relevance to the context of the mathematical work.

At multiple times during the workshop we were shown lists of how many hours were invested by those who have already learned the theory and for how long person A has lectured on it to persons B and C. Such information shows admirable devotion and effort by those involved, but it is irrelevant to the evaluation and learning of mathematics. All of the arithmetic geometry experts in the audience have devoted countless hours to the study of difficult mathematical subjects, and I do not believe that any of us were ever guided or inspired by knowledge of hour-counts such as that. Nobody is convinced of the correctness of a proof by knowing how many hours have been devoted to explaining it to others; they are convinced by the force of ideas, not by the passage of time.

It's all very strange. Maybe Mochizuki is just a lot smarter than than us, and we're like dogs trying to learn calculus. Experts say he did a lot of brilliant work before his proof of the abc conjecture, so this is possible.

But, speaking as one dog to another, let me tell you what the abc conjecture says. It's about this equation:

a + b = c

Looks simple, right? Here a, b and c are positive integers that are relatively prime: they have no common factors except 1. If we let d be the product of the distinct prime factors of abc, the conjecture says that d is usually not much smaller than c.

More precisely, it says that if p > 1, there are only finitely many choices of relatively prime a,b,c with a + b = c and

d^p < c

It looks obscure when you first see it. It's famous because it has tons of consequences! It implies the Fermat–Catalan conjecture, the Thue–Siegel–Roth theorem, the Mordell conjecture, Vojta's conjecture (in dimension 1), the Erdős–Woods conjecture (except perhaps for a finitely many counterexamples)... blah blah blah... etcetera etcetera.

Let me just tell you the Fermat–Catalan conjecture, to give you a taste of this stuff. In fact I'll just tell you one special case of that conjecture: there are at most finitely many solutions of

x^3 + y^4 = z^7

where x,y,z are relatively prime positive integers. The numbers 3,4,7 aren't very special - they could be lots of other things. But the Fermat–Catalan conjecture has some fine print in it that rules out certain choices of these exponents. In fact, if we rule out those exponents and also certain silly choices of x,y,z, it says there are only finitely many solutions even if we let the exponents vary! Here's a complete list of known solutions:

1^m + 2^3 = 3^2
2^5 + 7^2 = 3^4
13^2 + 7^3 = 2^9
2^7 + 17^3 = 71^2
3^5 + 11^4 = 122^2
33^8 + 1549034^2 = 15613^3
1414^3 + 2213459^2 = 65^7
9262^3 + 15312283^2 = 113^7
17^7 + 76271^3 = 21063928^2
43^8 + 96222^3 = 30042907^2

The first one is weird because m can be anything: we need some fine print to say this doesn't count as infinitely many solutions.

It's a long way from here to the very first paragraph in the summary at the start of Yamashita's paper:

By combining a relative anabelian result (relative Grothendieck Conjecture over sub-p-adic felds (Theorem B.1)) and "hidden endomorphism" diagram (EllCusp) (resp. "hidden endomorphism" diagram (BelyiCusp)), we show absolute anabelian results: the elliptic cuspidalisation (Theorem 3.7) (resp. Belyi cuspidalisation (Theorem 3.8)). By using Belyi cuspidalisations, we obtain an absolute mono-anabelian reconstruction of the NF-portion of the base field and the function field (resp. the base field) of hyperbolic curves of strictly Belyi type over sub-p-adic fields (Theorem 3.17) (resp. over mixed characteristic local fields (Corollary 3.19)). This gives us the philosophy of arithmetical holomorphicity and mono-analyticity (Section 3.5), and the theory of Kummer isomorphism from Frobenius-like objects to etale-like objects (cf. Remark 3.19.2).

And it's a long way from this – which still sounds sorta like stuff I hear
mathematicians say – to the scary theorems that crawl out of their caves around page 200!

Check out Yamashita's paper and see what I mean:

You can read Brian Conrad's story of the Oxford conference here:

You can learn more about the abc conjecture here:

And you can learn more about Mochizuki here:

He is the leader of and the main contributor to one of major parts of modern number theory: anabelian geometry. His contributions include his famous solution of the Grothendieck conjecture in anabelian geometry about hyperbolic curves over number fields. He initiated and developed several other fundamental developments: absolute anabelian geometry, mono-anabelian geometry, and combinatorial anabelian geometry. Among other theories, Mochizuki introduced and developed Hodge–Arakelov theory, p-adic Teichmüller theory, the theory of Frobenioids, and the etale theta-function theory.

Add a comment...

Post has attachment
Domain Models - Late Evaluation buys you better Composition
In the last post we talked about early abstractions that allow you to design generic interfaces which can be polymorphic in the type parameter. Unless you abuse the type system of a permissive language like Scala, if you adhere to the principles of parametr...
Add a comment...

Post has attachment
Domain Models - Early Abstractions and Polymorphic Domain Behaviors
Let's talk genericity or generic abstractions. In the last post we talked about an abstraction Money , which, BTW was not generic. But we expressed some of the operations on Money in terms of a Money[Monoid] , where Monoid is a generic algebraic structure. ...
Add a comment...

Post has attachment
Domain models, Algebraic laws and Unit tests
In a domain model, when you have a domain element that forms an algebraic abstraction honoring certain laws, you can get rid of many of your explicitly written unit tests just by checking the laws. Of course you have to squint hard and discover the lawful a...
Add a comment...

Post has shared content
Originally shared by ****
The Machine Learning Master Algorithm

This is a great, accessible talk on machine learning, the five major learning paradigms, and efforts to combine them all into one Master Algorithm that uses the strengths of all five approaches to create the best, most flexible, and most effective learning machines.

The five approaches are:
- Identify and Fill Knowledge Gaps
- Neural Network Learning
- Evolutionary Learning
- Bayesian Learning
- Learning by Analogy

There are good examples of where each is used, what their strengths are, and discussion of how the core practitioners or tribes of each tend to think that their way is best. Thanks to whoever first shared this one here, I've had this in Watch Later for a while and can't remember who it was.
Add a comment...

Post has shared content
Congratulations to Research Scientist David Silver (Google DeepMind) and Software Engineer Sylvain Gelly (Google Research, Europe), who received the Artificial Intelligence Journal’s 2016 Prominent Paper Award for their 2011 paper “Monte-Carlo tree search and rapid action value estimation in computer Go” (

Given at the IJCAI Conference in New York this week (, #IJCAI16 ), this award was given to recognize the impact of the research into augmented Monte-Carlo Tree Search algorithms (part of Sylvain's PhD thesis while at Université Paris Sud) that eventually led to the recent defeat of Go player Lee Se-dol by Google DeepMind’s AlphaGo.#
Add a comment...

Post has shared content
My pleasantly alliterative talk at the Salt Lake Data Science Meetup. Schwartz-Zippel-DeMillo-Lipton, algorithms with a bad gambling habit, near neighbor search and more !
Add a comment...

Post has shared content

Post has shared content
Critique of Paper by "Deep Learning Conspiracy" (Nature 521 p 436)

Machine learning is the science of credit assignment. The machine learning community itself profits from proper credit assignment to its members. The inventor of an important method should get credit for inventing it. She may not always be the one who popularizes it. Then the popularizer should get credit for popularizing it (but not for inventing it). Relatively young research areas such as machine learning should adopt the honor code of mature fields such as mathematics: if you have a new theorem, but use a proof technique similar to somebody else's, you must make this very clear. If you "re-invent" something that was already known, and only later become aware of this, you must at least make it clear later.

As a case in point, let me now comment on a recent article in Nature (2015) about "deep learning" in artificial neural networks (NNs), by LeCun & Bengio & Hinton (LBH for short), three CIFAR-funded collaborators who call themselves the "deep learning conspiracy" (e.g., LeCun, 2015). They heavily cite each other. Unfortunately, however, they fail to credit the pioneers of the field, which originated half a century ago. All references below are taken from the recent deep learning overview (Schmidhuber, 2015), except for a few papers listed beneath this critique focusing on nine items.

1. LBH's survey does not even mention the father of deep learning, Alexey Grigorevich Ivakhnenko, who published the first general, working learning algorithms for deep networks (e.g., Ivakhnenko and Lapa, 1965). A paper from 1971 already described a deep learning net with 8 layers (Ivakhnenko, 1971), trained by a highly cited method still popular in the new millennium. Given a training set of input vectors with corresponding target output vectors, layers of additive and multiplicative neuron-like nodes are incrementally grown and trained by regression analysis, then pruned with the help of a separate validation set, where regularisation is used to weed out superfluous nodes. The numbers of layers and nodes per layer can be learned in problem-dependent fashion.

2. LBH discuss the importance and problems of gradient descent-based learning through backpropagation (BP), and cite their own papers on BP, plus a few others, but fail to mention BP's inventors. BP's continuous form was derived in the early 1960s (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969). Dreyfus (1962) published the elegant derivation of BP based on the chain rule only. BP's modern efficient version for discrete sparse networks (including FORTRAN code) was published by Linnainmaa (1970). Dreyfus (1973) used BP to change weights of controllers in proportion to such gradients. By 1980, automatic differentiation could derive BP for any differentiable graph (Speelpenning, 1980). Werbos (1982) published the first application of BP to NNs, extending thoughts in his 1974 thesis (cited by LBH), which did not have Linnainmaa's (1970) modern, efficient form of BP. BP for NNs on computers 10,000 times faster per Dollar than those of the 1960s can yield useful internal representations, as shown by Rumelhart et al. (1986), who also did not cite BP's inventors.

3. LBH claim: "Interest in deep feedforward networks [FNNs] was revived around 2006 (refs 31-34) by a group of researchers brought together by the Canadian Institute for Advanced Research (CIFAR)." Here they refer exclusively to their own labs, which is misleading. For example, by 2006, many researchers had used deep nets of the Ivakhnenko type for decades. LBH also ignore earlier, closely related work funded by other sources, such as the deep hierarchical convolutional neural abstraction pyramid (e.g., Behnke, 2003b), which was trained to reconstruct images corrupted by structured noise, enforcing increasingly abstract image representations in deeper and deeper layers. (BTW, the term "Deep Learning" (the very title of LBH's paper) was introduced to Machine Learning by Dechter (1986), and to NNs by Aizenberg et al (2000), none of them cited by LBH.)

4. LBH point to their own work (since 2006) on unsupervised pre-training of deep FNNs prior to BP-based fine-tuning, but fail to clarify that this was very similar in spirit and justification to the much earlier successful work on unsupervised pre-training of deep recurrent NNs (RNNs) called neural history compressors (Schmidhuber, 1992b, 1993b). Such RNNs are even more general than FNNs. A first RNN uses unsupervised learning to predict its next input. Each higher level RNN tries to learn a compressed representation of the information in the RNN below, to minimise the description length (or negative log probability) of the data. The top RNN may then find it easy to classify the data by supervised learning. One can even "distill" a higher, slow RNN (the teacher) into a lower, fast RNN (the student), by forcing the latter to predict the hidden units of the former. Such systems could solve previously unsolvable very deep learning tasks, and started our long series of successful deep learning methods since the early 1990s (funded by Swiss SNF, German DFG, EU and others), long before 2006, although everybody had to wait for faster computers to make very deep learning commercially viable. LBH also ignore earlier FNNs that profit from unsupervised pre-training prior to BP-based fine-tuning (e.g., Maclin and Shavlik, 1995). They cite Bengio et al.'s post-2006 papers on unsupervised stacks of autoencoders, but omit the original work on this (Ballard, 1987).

5. LBH write that "unsupervised learning (refs 91-98) had a catalytic effect in reviving interest in deep learning, but has since been overshadowed by the successes of purely supervised learning." Again they almost exclusively cite post-2005 papers co-authored by themselves. By 2005, however, this transition from unsupervised to supervised learning was an old hat, because back in the 1990s, our unsupervised RNN-based history compressors (see above) were largely phased out by our purely supervised Long Short-Term Memory (LSTM) RNNs, now widely used in industry and academia for processing sequences such as speech and video. Around 2010, history repeated itself, as unsupervised FNNs were largely replaced by purely supervised FNNs, after our plain GPU-based deep FNN (Ciresan et al., 2010) trained by BP with pattern distortions (Baird, 1990) set a new record on the famous MNIST handwritten digit dataset, suggesting that advances in exploiting modern computing hardware were more important than advances in algorithms. While LBH mention the significance of fast GPU-based NN implementations, they fail to cite the originators of this approach (Oh and Jung, 2004).

6. In the context of convolutional neural networks (ConvNets), LBH mention pooling, but not its pioneer (Weng, 1992), who replaced Fukushima's (1979) spatial averaging by max-pooling, today widely used by many, including LBH, who write: "ConvNets were largely forsaken by the mainstream computer-vision and machine-learning communities until the ImageNet competition in 2012," citing Hinton's 2012 paper (Krizhevsky et al., 2012). This is misleading. Earlier, committees of max-pooling ConvNets were accelerated on GPU (Ciresan et al., 2011a), and used to achieve the first superhuman visual pattern recognition in a controlled machine learning competition, namely, the highly visible IJCNN 2011 traffic sign recognition contest in Silicon Valley (relevant for self-driving cars). The system was twice better than humans, and three times better than the nearest non-human competitor (co-authored by LeCun of LBH). It also broke several other machine learning records, and surely was not "forsaken" by the machine-learning community. In fact, the later system (Krizhevsky et al. 2012) was very similar to the earlier 2011 system. Here one must also mention that the first official international contests won with the help of ConvNets actually date back to 2009 (three TRECVID competitions) - compare Ji et al. (2013). A GPU-based max-pooling ConvNet committee also was the first deep learner to win a contest on visual object discovery in large images, namely, the ICPR 2012 Contest on Mitosis Detection in Breast Cancer Histological Images (Ciresan et al., 2013). A similar system was the first deep learning FNN to win a pure image segmentation contest (Ciresan et al., 2012a), namely, the ISBI 2012 Segmentation of Neuronal Structures in EM Stacks Challenge.

7. LBH discuss their FNN-based speech recognition successes in 2009 and 2012, but fail to mention that deep LSTM RNNs had outperformed traditional speech recognizers on certain tasks already in 2007 (Fernández et al., 2007) (and traditional connected handwriting recognisers by 2009), and that today's speech recognition conferences are dominated by (LSTM) RNNs, not by FNNs of 2009 etc. While LBH cite work co-authored by Hinton on LSTM RNNs with several LSTM layers, this approach was pioneered much earlier (e.g., Fernandez et al., 2007).

8. LBH mention recent proposals such as "memory networks" and the somewhat misnamed "Neural Turing Machines" (which do not have an unlimited number of memory cells like real Turing machines), but ignore very similar proposals of the early 1990s, on neural stack machines, fast weight networks, self-referential RNNs that can address and rapidly modify their own weights during runtime, etc (e.g., AMAmemory 2015). They write that "Neural Turing machines can be taught algorithms," as if this was something new, although LSTM RNNs were taught algorithms many years earlier, even entire learning algorithms (e.g., Hochreiter et al., 2001b).

9. In their outlook, LBH mention "RNNs that use reinforcement learning to decide where to look" but not that they were introduced a quarter-century ago (Schmidhuber & Huber, 1991). Compare the more recent Compressed NN Search for large attention-directing RNNs (Koutnik et al., 2013).

One more little quibble: While LBH suggest that "the earliest days of pattern recognition" date back to the 1950s, the cited methods are actually very similar to linear regressors of the early 1800s, by Gauss and Legendre. Gauss famously used such techniques to recognize predictive patterns in observations of the asteroid Ceres.

LBH may be backed by the best PR machines of the Western world (Google hired Hinton; Facebook hired LeCun). In the long run, however, historic scientific facts (as evident from the published record) will be stronger than any PR. There is a long tradition of insights into deep learning, and the community as a whole will benefit from appreciating the historical foundations.

The contents of this critique may be used (also verbatim) for educational and non-commercial purposes, including articles for Wikipedia and similar sites.

References not yet in the survey (Schmidhuber, 2015):

Y. LeCun, Y. Bengio, G. Hinton (2015). Deep Learning. Nature 521, 436-444.

Y. LeCun (2015). IEEE Spectrum Interview by L. Gomes, Feb 2015:

R. Dechter (1986). Learning while searching in constraint-satisfaction problems. University of California, Computer Science Department, Cognitive Systems Laboratory. First paper to introduce the term "Deep Learning" to Machine Learning.

I. Aizenberg, N.N. Aizenberg, and J. P.L. Vandewalle (2000). Multi-Valued and Universal Binary Neurons: Theory, Learning and Applications. Springer Science & Business Media. First paper to introduce the term "Deep Learning" to Neural Networks. Compare a popular G+ post on this:

J. Schmidhuber (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85-117. Preprint:

AMAmemory (2015): Answer at reddit AMA (Ask Me Anything) on "memory networks" etc (with references):


Add a comment...
Wait while more posts are being loaded