Shared publicly  - 
 
I've recently been working on a project to train very large neural networks with a bunch of colleagues.  In addition to building infrastructure to train very large networks, we've been working to apply them to various application domains.  As part of that work, we've been collaborating with +Vincent Vanhoucke and other members of the speech team at Google to use deep networks in speech recognition.   The blog post linked below gives some details about the system.  

When you use the voice recognition on your Android phone, you're using the results of this collaboration.

We're also applying these networks to other application domains, like image recognition, language modeling and machine translation.  You can read more about our use of these kinds of networks for image recognition here:

http://research.google.com/archive/unsupervised_icml2012.html

http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?_r=1
141
127
Trevor Stone's profile photoRocky Ugsod's profile photoUmar Nawaz's profile photoArtem Boytsov's profile photo
15 comments
 
If you train the speech recognizer on thousands of hours of YouTube, does it infer knowledge of meows?
 
did the apple already did this with their siri?
 
Hurray for Deep Learning! It even works weez my French accent.
 
This is a fantastic paper, Jeff! It seems like it confirms an important hypothesis about how human brain learns, and represents a big step in the direction of human-level AI. Congratulations!

I didn't quite understand this passage though:

----
As reported above, the best neuron achieves 81.7% accuracy in classifying faces against random distractors. What if we remove all images that have faces from the training set? We performed the control experiment by running a face detector in OpenCV and removing those training images that contain at least one face. The recognition accuracy of the best neuron dropped to 72.5% which
is as low as simple linear filters reported in section 4.3.
----

If you remove all faces from the training dataset, what is referred to by "recognition accuracy"? Presumably, there were no training on faces, and as such there could be no recognition?

Sorry if my neural network knowledge is a bit rusty...
 
It would be fun to look at all the other concepts it learned... 
 
+Artem Boytsov Even if we remove faces from the training set, there could be other characteristics of other images that correlate with whether or not an image contains a face.  For example, images of indoor scenes are probably more likely contain faces than natural outdoor scenes, so a neuron that detects whether or not an image is likely to be an indoor scene might be somewhat better than random as a face detector.  However, the purpose of that control experiment is to show that it really is the faces in the training set that are help the system learn high level features like faces.

We also have a UI that lets us examine the top stimuli for all the neurons in the system.  You can see a few examples in a talk I gave recently, around slides 104 to 106 in this PDF:

http://cra.org/uploads/documents/resources/snowbird2012_slides/dean.pdf
 
Closer to home, there is also Kalanit Grill-Spector at Stanford, who worked with Nancy.
 
What was the architecture for building the DBN? Was it Google Compute Engine?

I was very impressed by the unsupervised learning of higher features described in this paper. We'd like to apply this to our own work in genomics, what are your recommendations on the practicality of building a large network like this? We have experience with other ML methods, but nothing of this size.
 
+Michael Barton We built our own custom software system for training large networks on large clusters of machines.  

We're hoping to have a paper published in the near future that gives more details on the training system we've built, but the two main principles we use are

  (a) partitioning a single model across multiple machines (model-level parallelism),
  (b) data parallelism for training, by stamping out multiple copies of these multi-machine models, all sharing a set of parameters over the network through a centralized parameter server service, which serves fresh parameter copies, and applies gradient updates sent to it by the model replicas.

A few diagrams of this are shown starting about halfway through the following slide deck (start with the slide "Scaling Deep Learning"): 

http://cra.org/uploads/documents/resources/snowbird2012_slides/dean.pdf

Given enough training data and computational cycles, it's definitely practical.  I think these sort of techniques would be very good in the genomics domain, because of their ability to automatically identify complicated, high-level features/interactions from the raw data.
 
A lot of people would be grateful (and pay you) if you release this as a utility computing style api.
 
+Jeff Dean Very useful to know, thank you. I would be very interested in your publication when it becomes available.

As you wrote, I agree genomics is ripe for the application of DBNs. At present the number of features, such as genes, far exceeds the number of available records for training, e.g. genomes. However I think the increasing proliferation of desktop sequencers will mean in the next few years that the numbers of records will reach a size where training becomes realistic.

We're in the process of writing a grant for this application of DBNs to microbial genomics. Would it be possible to contact you to discuss our ideas so far?
 
i wish you guys so much good luck!!!!!
Add a comment...