+Ran Manor FYI

### Yann LeCun

Shared publicly -Deep Learning has taken over all the big search companies, including Google, Microsoft and Baidu, as well as a few companies that produce technology for them, such as IBM.

They all have deployed DL systems for speech recognition and image content analysis (for search and other things). And there are several efforts to use DL for language modeling and ranking for search and ad placement.

Google has a large number of people in several groups working to develop and deploy DL applications. Microsoft has several groups working on it for speech, image and search, and Baidu is setting up their Institute for Deep Learning in Cupertino.

It's interesting how Deep Learning has spread like wildfire in industry (probably because it works so well), while its spread in academic research circles has been somewhat slower, relatively speaking.

It's a rather unusual phenomenon.

They all have deployed DL systems for speech recognition and image content analysis (for search and other things). And there are several efforts to use DL for language modeling and ranking for search and ad placement.

Google has a large number of people in several groups working to develop and deploy DL applications. Microsoft has several groups working on it for speech, image and search, and Baidu is setting up their Institute for Deep Learning in Cupertino.

It's interesting how Deep Learning has spread like wildfire in industry (probably because it works so well), while its spread in academic research circles has been somewhat slower, relatively speaking.

It's a rather unusual phenomenon.

44

14

17 comments

Ingo Lütkebohle

+

2

3

2

3

2

Well, the phrase "taken over" seems to imply that DL has

If, on the other hand, DL is merely one tool that is being applied for particular problems, then this would more closely match what is being done in academia, wouldn't it?

**replaced**other ML technology**completely**. Is that really the case? If yes, it would indeed be very striking. Of course I don't know what these companies are doing internally, but seeing that their research departments regularly put out papers using other methods, it doesn't seem likely that this is what is happening.If, on the other hand, DL is merely one tool that is being applied for particular problems, then this would more closely match what is being done in academia, wouldn't it?

Olivier Teytaud

+

2

3

2

3

2

People in universities are more and more involved in filling forms and requesting grants and publication races. So they (we ?.. :-) ) prefer things that you can study after 20 lines of code before starting to write the corresponding article. Deep learning is not something easy with 20 lines of code. It's a bit similar in discrete time control, in my humble opinion; things which are good for your publication records are not always things which are good for science.

Mark Cummins

+

1

0

1

0

1

0

I think the enthusiasm from industry is because deep learning really shines when you have a lot of training data available.

Zaikun Xu

+

1

2

1

2

1

as a company, I think it is the sense of crisis that drive them study and apply new technology much quicker.

Does anyone know of a good overview article(s) on DL?

Maybe this is a case where universities can't compete with industry? I cite the fact that Geoff Hinton is mostly moving to Google as evidence. Put another way: What are the DL research questions that industry is not going to explore and where university researchers could play an important role?

Guy Lebanon

+

8

9

8

9

8

It is difficult to analyze DL theoretically in the same way that one can analyze linear regression, logistic regression, or SVM. One reason for this is the lack of convexity and another is the deviation from linearity.

University researchers are more concerned with theory than industry scientists and engineers, which is why you see a lot of papers on linear regression and such in statistics and ML literature.

University researchers are more concerned with theory than industry scientists and engineers, which is why you see a lot of papers on linear regression and such in statistics and ML literature.

Adding neurons is a solution for avoiding local minima; at least one neuron should be initialized in the right direction for avoiding local minima :-) hard to formalize, but this principle makes sense I guess. There also studies on the expressive power of neural networks depending on the size (separating the effect of numbers of layers and numbers of neurons). I am not enough involved in this literature for pointing out references but I remember this kind of stuff, a long time ago.

Many of the deep learning techniques that are responsible for the success of the field were developed by accident or by trying things that intuitively made sense. After they are shown to work, some sort of theoretical justification is loosely formed to say why doing these things made sense. The theory is still very informal and intuition-based. I wonder if the lack of academic interest is more because theoretical justification is uninteresting for things that already work and were developed without a theoretical basis, or if it's just that the analysis of such networks is just too dang hard.

Yann LeCun

+

5

6

5

6

5

Every new learning technique is developed through a combination of intuitive insight and guidance from theoretical insight. A few (very few) come up through inspiration from biology (also combined with intuitive and theoretical insight). Many come about from the need to solve a new practical problem.

In my experience, it never occurs "by accident".

In all cases that I know, the theory always comes after the insight/intuition. But that very much depends what you mean by "theory".

If by "theory" you mean a generalization bound, then every deep learning algorithm in wide use has one by default, because every fixed-size deep learning system has a finite VC dimension. The general VC bounds apply. It's only for things like SVM that you need special bounds because they are non-parametric (the number of parameters grows with the number of samples). Even the maximum margin regularizer comes from a very natural intuition (it's basically L2 regularization). Incidentally, the idea of L2 and L1 regularization on parameters are much, much older than most people think (way older than any of the theoretical papers written about them).

Interestingly, theoretical insights can also prevent you from doing the right thing. And deep learning has fallen victim to this too: for a long time, people thought that neural nets shouldn't be too big to avoid over-fitting. But it turns out the best strategy (for speech or image recognition) is to make the network ridiculously large and regularize the hell out of it (e.g. with drop out and other methods).

The main problem with being too enamored with theory is that it can restrict your thinking to models that you can analyze. This pretty much limits you to generalized linear models and convex losses.

For theorists, deep learning is a wide open field. There are huge opportunities for theory to analyze and understand what goes on in deep learning systems. Now that deep learning have shown to work very well, and now that there is large commercial interest in them, there might be enough of a motivation for theorists to crack that nut.

In my experience, it never occurs "by accident".

In all cases that I know, the theory always comes after the insight/intuition. But that very much depends what you mean by "theory".

If by "theory" you mean a generalization bound, then every deep learning algorithm in wide use has one by default, because every fixed-size deep learning system has a finite VC dimension. The general VC bounds apply. It's only for things like SVM that you need special bounds because they are non-parametric (the number of parameters grows with the number of samples). Even the maximum margin regularizer comes from a very natural intuition (it's basically L2 regularization). Incidentally, the idea of L2 and L1 regularization on parameters are much, much older than most people think (way older than any of the theoretical papers written about them).

Interestingly, theoretical insights can also prevent you from doing the right thing. And deep learning has fallen victim to this too: for a long time, people thought that neural nets shouldn't be too big to avoid over-fitting. But it turns out the best strategy (for speech or image recognition) is to make the network ridiculously large and regularize the hell out of it (e.g. with drop out and other methods).

The main problem with being too enamored with theory is that it can restrict your thinking to models that you can analyze. This pretty much limits you to generalized linear models and convex losses.

For theorists, deep learning is a wide open field. There are huge opportunities for theory to analyze and understand what goes on in deep learning systems. Now that deep learning have shown to work very well, and now that there is large commercial interest in them, there might be enough of a motivation for theorists to crack that nut.

About DL.. I'm not good at it yet, I talked to my friends (most of them are industry engineers) "It's so difficult!" Because if one want to understand DL thoroughly, he have to know very many things. (The things are all listed in Bengio's review paper well.) Individually i am moved from the fact that many achievments can converge to DL now and it can show good performance indeed.

Yoshua Bengio

+

1

2

1

2

1

+Chuck Wooters You can find several review papers on DL on my web page.

+Thomas Dietterich You don't need huge means to make a big difference. As witness, the neural net trained by Alex Krizhevsky et al on his laptop that has made a big splash in the last few months.

Eric Battenberg

+

1

2

1

2

1

+Yoshua Bengio Are you talking about his ImageNet entry? Wasn't that two GTX 580's strapped together?

Thanks for the pointer +Yoshua Bengio, via your G+ page I also discovered http://deeplearning.net/ which seems to have a lot of great stuff!!

Yes, +Eric Battenberg and +Yoshua Bengio: Alex used a desktop machine with two GTX580 GPU cards in it. It wasn't a laptop.

Add a comment...