Profile cover photo
Profile photo
Frank Rusch
Frank's posts

Post has attachment

Post has shared content
All of these images were computer generated!

For the last few weeks, Googlers have been obsessed with an internal visualization tool that Alexander Mordvintsev in our Zurich office created to help us visually understand some of the things happening inside our deep neural networks for computer vision.  The tool essentially starts with an image, runs the model forwards and backwards, and then makes adjustments to the starting image in weird and magnificent ways.  

In the same way that when you are staring at clouds, and you can convince yourself that some part of the cloud looks like a head, maybe with some ears, and then your mind starts to reinforce that opinion, by seeing even more parts that fit that story ("wow, now I even see arms and a leg!"), the optimization process works in a similar manner, reinforcing what it thinks it is seeing.  Since the model is very deep, we can tap into it at various levels and get all kinds of remarkable effects.

Alexander, +Christopher Olah, and Mike Tyka wrote up a very nice blog post describing how this works:

There's also a bigger album of more of these pictures linked from the blog post:

I just picked a few of my favorites here.
18 Photos - View album

Post has attachment
Today is the first PG&E SmartDay of the season.

Post has attachment
OEM-installed malware is bad... especially the kind that lets people at starbucks MITM your https traffic.

Post has attachment
MacBook users -- it might be time to superglue your thunderbolt port.

Post has shared content
Cool stuff... even its misfires are interesting.
“A green monster kite soaring in a sunny sky.” intepreted as:
“A man flying through the air while riding a snowboard.”
A group of young people playing a game of frisbee.
A pizza sitting on top of a pan on top of a stove.
A person riding a motorcyle on a dirt road.
These are automatically generated captions from a computer model that starts with just the raw pixels of an image, described in a recent research paper titled Show and Tell: A Neural Image Caption Generator that was just published on Arxiv (

+Oriol Vinyals, +Alexander Toshev, +Samy Bengio,  and +Dumitru Erhan in our research group at Google have been working on automatically generating these captions using an accurate convolutional neural network (similar to the one that won the 2014 ImageNet object recognition challenge) combined with a powerful recurrent neural network language model (using an LSTM, a particular kind of recurrent network that is good at capturing long-range dependencies in sequence data, similar to the model that was used recently by our group's recent work on using LSTMs for machine translation).  The system initializes the state of the language model with the features from the top of the convolutional neural network, and is then trained to generate captions using a modest amount of human-labeled training data of (image, caption) pairs, and the resulting system does a good job of generalizing to generating captions automatically from previously-unseen images.

Since two of these folks sit within 15 feet of me, I've enjoyed watching their progress on this project and chatting with them over the past few weeks as it has developed.  The examples you can see in the New York Times article are great examples of what the system can do: it doesn't always get it right, but in general, the captions it generates are very fluent, mostly relevant to the image, and sometimes show a surprising level of sophistication.  Furthermore, because it is a generative model, and we're sampling from the distribution of possible captions, you can run the model multiple times, and it will generate different captions.  For one image, it might generate the two different captions "_A close up of a child holding a stuffed animal_" and "_A baby is asleep next to a teddy bear._"

+John Markoff of the New York Times has written up a nice article about this work (along with some similar research out of Stanford that has been happening concurrently):

A Google Research blog post about the work has also just been put up here:

An Arxiv paper titled Show and Tell: A Neural Image Caption Generator appears here:

You can see a few more examples at the end of the set of slides from a talk I gave recently in China (pages 75 to 79 of this PDF):

[ Edited to insert the title and link to the Arxiv paper now that it made it through the Arxiv editorial review process. ]

Post has attachment

Post has attachment

Post has attachment

Post has attachment
Learning how to walk...
Wait while more posts are being loaded