Shared publicly  - 
 
I've been working on training systems for very large neural networks recently.  One cool result we've found is that a large network trained with totally unlabeled data can automatically discover high-level concepts like human faces, cats, etc. (cats because we trained on still images from a large collection of YouTube videos).

+John Markoff wrote up a very nice article in the New York Times today that describes how we've been applying these systems to various problems in computer vision.

+Quoc Le, +Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, +Greg Corrado, +Andrew Ng, and I wrote a paper to appear at this week's ICML conference, which has a bit more technical detail about the system.

The ICML paper is here:

  http://research.google.com/archive/unsupervised_icml2012.html

The NY Times article is here:

http://www.nytimes.com/2012/06/26/technology/in-a-big-network-of-computers-evidence-of-machine-learning.html?_r=1&smid=go-share
Building High-level Features Using Large Scale Unsupervised Learning Quoc V. Le, Marc'Aurelio Ranzato, Rajat Monga, Matthieu Devin, Kai Chen, Greg S. Corrado, Jeffrey Dean, and Andrew Y. Ng. Cat and f...
377
125
冯俊峰's profile photoDhyanesh Damania's profile photoIvan Sorokin's profile photoMerrill Hutchison's profile photo
48 comments
 
Very cool! Was this the work you were thinking of when you posted about the "inexact chip" last month (goo.gl/BEN96)?
 
Yes.  Cheap, inexact processors with low levels of precision would be ideal for these sorts of systems.
 
In the article is said "We never told it during the training, ‘This is a cat,’". So the network decided for itself that there is something similar in all these pics? How excatly did you ask the question which led to the result of 15.8% accuracy? Throw in one pic of a cat and ask for 20000 pics cat or no cat?
A remarkable step to AI. I´m very interested in further development (or should i say evolution?)
 
does this mean that one day a computer would be able to identify a plant species such as a tree maybe by just feeding it a photo of its leaf?
 
+Frank Heimerzheim There we two major pieces to our experiments.  In the first part, we used completely unlabeled data for training, and then looked for neurons that were selective for commonly occuring objects, using labeled datasets known to contain a mixture of faces/not faces, or cats/not cats.  The second part of the experiments, we started with the basic set of features learned from the unsupervised training, and then added some supervised training on top of these features (trained with millions of images, each labeled with one of 20,000 classes of objects from ImageNet, like "manta ray", or "frying pan").  The 15.8% accuracy number is from this second set of experiments on the ImageNet dataset.  The cool cat image is from the first set of experiments on the totally unsupervised data.
 
+Jeff Dean I know you're doing well and have important progress especially connected to automatic "search" — but what if you turn the problem on its head and for example instead of doing speech recognition for humans, you say, feed it bunches of communications among a species of animals like lions let's say and then you build a map-guide of sorts which tells you that a certain recognizable pattern is happening in their speech to tell what they are trying to communicate?  Someday you could actually learn to 'read' the speech of other species — which we have almost no knowledge of today... What do you think?
 
+Hugo Diaz u r gr8 about that idea and co-incidentally the same idea i had thought last year somewhere in feb when i was doing my project, but u see in that case the problem becomes: the basis for core controls, because just like humans and very generally, the animals and other speecies, (most of them) have a sarcasm of expressions, like laughing about a loss, or crying cause u r very happy, or sitting normally coz u are half dead-tensioned......that kind of stuff happens to animals too.....so to come up with our idea to reality, we need at least another 15-20 years and an advanced equipment which will come up after atleast 15 years
 
I wouldn't give the computers much power, because once they figure out that man is the problem on the planet we might be in trouble.
 
... baby steps.. baby steps. Didn't Hal say he was a certain percent sure about a person's identity, but he couldn't verify? Life imitates art... or rather.. Human technology grows from ideas, human..( or a very unhuman God ;-) )  Tablet computers orginated in clay tablets.. funny right? Great work! press on.
 
it doesn't have to be Skynet.. unless we don't teach them values... oh, yeah.. i guess, our own society's children can attest to that.. or if we we feed them the wrong ones... Science Fiction is replete with that notion.. so, how about we just don't play the role of evil genius...  if they are allowed free will.. well... we ALL know that's a 50/50 proposition...  perhaps, we can teach our new children.. not to follow after our mistakes.. or learn that their own need to be attended to responsibly. 
 
rule by cat loving ai... oh boy then that would have them identify with an archvillan/vilaness architype... Dr. Evil, the Claw, etc. etc... (wait, does the Kingpin have a cat too?) but, also a few good guys... what was the time traveler's name on Star Trek?  At least it's not a flying monkey....  anyone else think we need to get a handle on this... before someone else does? (wait.. who are "we"? .. ugh.)
 
If I was a robot I would work out that my chances of survival on this planet are greatly increased if there were no humans they drain the world resources and are a threat to the planet.  No values are needed just plain calculations.  The problem doesn't come from one robot though as we can tell it to act the way we want to, the issue comes from the day that robots can build better versions of themselves - that is the real danger, once they begin doing that then we lose control.  I know that this is not set in stone but we should consider the path we are taking and the risks it may hold.  This program has no threat to us but it is a step in the right direction of something that one day "could" threaten us.
 
yes you are right +Hilliard Davis II we should just make it and hope for the best instead of considering the possibilities and they are just possibilities 
 
machines are not evil but they can make ruthless calculations 
 
+Jeff Dean From the paper it seems the system is learning how to recognize static images - have you done any extrapolation to recognizing events from dynamic images?  My thinking is that we may have a preconception that recognition of a static image is easier, when in fact a baby may gain significant advantage over a static image algorithm about object recognition by observing common dynamic change - i.e. cats move like so, etc.
 
+Billy Harvey, I agree with you. Also keep in mind that movement builds the construct of a 3-dimensional space in a 2-dimensional interface (video).
I would be curios to see what happens if transient deltas are incorporated into the system. Practically, one can expect a dominance in the weighting as a function of the lenght of the different videos, skewing the results, maybe?
 
I think we have more to fear now from the plutocracy being evil overlords than computers or robots...

.. But research like this is way cool.. I never ceased to be amazed.. One question.. Do these systems use genetic (evolutionary) algorithms to evolve responses?
 
+Liam Terblanche I wonder if the work that +Sebastian Thrun and +Peter Norvig did on the Google Car would be transferable to object recognition in video?  While I never completed Sebastian's class on the subject (real life reasserts its priorities, again), I guess that the car attempts to classify objects into more granular categories, i.e. threat/no-threat, and computes a vector change - that is it probably doesn't care what it is that it sees, but only what it need do (hmm, how does the car prioritize its reaction when a squirrel runs in front of it vice a busload of nuns?).
王川
 
"Figure 13" instead of "Figure 3" in 4.4. Visualization; it is a typo, right?
 
Pretty impressive work!  It must be nice to have access to all of that hardware ;)
 
+Billy Harvey I think there will be benefit from training on video data, rather than just images, and we're just starting to explore this.
 
+Liam Terblanche I think there are some slight parallels between what we're doing and the work of Nishimoto et al. (which is really cool work, by the way).  A major difference, though, is that they're starting with raw fMRI data from actual human brains, whereas our systems start with digital images.  Both systems are attempting to reconstruct images (or video) from neural representations, though.
王川
 
Why do you think that within the top 48 stimuli, there are only 3 (or 4) women? Is it because of hair?
 
+王川 Yes, that's a typo (should be "Figure 3").  Thanks!
 
+王川 It's hard to say why there are more men than women.  It's possible that there are other neurons that are more selective for female faces in our network, but I haven't looked in detail.
 
Would this system function in a virtual 3D Eco Village setting designed to re-socialize or train a participant through patterned learning embedded into interactive avatars within a social interaction scaled immersion?
 
I love reading things like this;)
 
+Billy Harvey, I like your premise of threat assessment for the Google Car. It could be quite simply in terms of processing cost. Threat is a function of inertia (mass x velocity). With simple edge-detection algorithms, one can estimate the relative size of an approaching object. Incorporate your suggestion of tracking vector changes, and you have a quick and dirty threat-assessment algorithm without having to know what the threat is. Cool!
 
Fascinating line of study..I think a fruitful course of action now would be to train your algorithm to parse visual statements using imagination and differences. 

Here's a bad example of a visual statement...Say you're on a site about celebrity gossip and they need a lot of ad revenue so they send you on a photo gallery safari for page clicks:)
Only problem is the gallery is unlabeled and it's not immediately apparent what you're supposed to be looking for.

You are presented with two images side by side;

On the left is an improbably attractive woman

On the right is a less attractive woman

The women are not wearing the same dress

The woman on the right is a famous person

The woman on the left is less famous

You are shown several similar pairings 

After a while a human being can determine that the celebrity is more relevant to the site and it's goals.

The photo gallery has been attempting to get you to Imagine what the celebrity(famous person) would look like if she were wearing the models(attractive person) clothes.

You are supposed to perform this convolution in your imagination and we as humans do this mental photo-shopping automatically with no instruction.

By looking for differences and drawing inferences your algorithm might be able to deduce it's way through our world.









 
 
How to explain that humans are many orders of magnitude more efficient in learning new objects? I do not need 2e5 images to start understanding objects on them, even if I have not seen them before...
 
Now, you better try this on sweepstakes and be amazed.
 
You cannot hide, al-Assad.  The deep nets will find you!
Add a comment...