The challenge is that, when new material shows up on the scene, you don't yet have any human interactions -- and quite often, good material, things people would love, simply goes unnoticed and never builds up the interaction signals which help. To detect quality in these things requires understanding the content itself, and the aspects of it which matter to people.
There are several hard aspects to this. One is simply understanding the content at the right granularity: "the color of the top-left pixel" or "the frequency of the word 'whenever'" are too fine-grained to give us a hint about whether people will like something, so we need to be able to group the content into more meaningful structures. For images, that might be "an image of a face in 3/4-profile," a certain color balance or contrast, a perspective or a cropping, and advances in image recognition in the past few years have (finally) made it possible to reliably identify such features. For text, it's much harder: there isn't yet even a clear idea of what features both could be measured about text and determine people's tastes. (How do you measure "intellectually meaty" or "hinting at scandal?")
This paper has used the recent advances in image processing, together with recent advances in AI in general, to get a sense of which pictures people will like. It started by taking several thousand images, and having them rated by humans for quality; that was used as "ground truth." Then, those thousands of images are analyzed into meaningful features, and a neural network is trained to find patterns of image features which predict human taste.
This is what neural networks, and other kinds of "supervised" machine learning systems, do in general: they take as inputs a bunch of signals, and combine them using a large number of parameters -- the "weights" -- to produce predictions of some values that you want to measure. The weights are set by taking a large number of test examples ("golden data" or "ground truth") with known values of both the signals and the test values; weights are chosen ("trained") to maximize the quality of the system's predictions for this data. To make sure that the training doesn't just teach it to recognize those specific examples, the golden data is randomly split into two groups; one is used for training, and then it's tested against the other group to make sure that the predictions with the trained weights are good. If they are, then you have a model which can predict -- given any set of measured signals -- the truth values.
In this case, the signals are these features of the image, measured by a second machine learning system; the quantity being predicted is whether people will like it. Because these are all "content-based signals" -- that is, they're based on the contents of the image, and not on people's responses to it -- the resulting model can be applied to any image.
The team then applied this model to a set of 9 million images from Flickr with fewer than five "favorites." They tested the quality of its picks by having human raters compare that result set with the set of popular images on Flickr; the result was excellent, with its "hidden gems" scoring statistically the same as the most popular images on the site.
I would expect a lot more work on related techniques over the next few years, and for this to have a significant impact on the way that content recommendation is done. The main upshot will be that more little-known works get the spotlight they deserve -- something critical, as more and more people are creating things of value that they want the world to see.
For the last few weeks, Googlers have been obsessed with an internal visualization tool that Alexander Mordvintsev in our Zurich office created to help us visually understand some of the things happening inside our deep neural networks for computer vision. The tool essentially starts with an image, runs the model forwards and backwards, and then makes adjustments to the starting image in weird and magnificent ways.
In the same way that when you are staring at clouds, and you can convince yourself that some part of the cloud looks like a head, maybe with some ears, and then your mind starts to reinforce that opinion, by seeing even more parts that fit that story ("wow, now I even see arms and a leg!"), the optimization process works in a similar manner, reinforcing what it thinks it is seeing. Since the model is very deep, we can tap into it at various levels and get all kinds of remarkable effects.
Alexander, , and Mike Tyka wrote up a very nice blog post describing how this works:
There's also a bigger album of more of these pictures linked from the blog post:
I just picked a few of my favorites here.
Revealed at Last: Magic Leap's Vision for Augmented Reality, in 32 Paten...
Here's why Google invested $542 million in an secretive Florida startup.
99 Sites That Every Professional Should Know About and Use
The internet is a big place, but we've found the best of it.
What exactly is the Internet of Things? A brief primer- Postscapes
An overview handbook looking into what the Internet of Things is and how it is coming into being
A collection of hand selected articles by Jimun Kim from SocialMediaDesign
Now in Beta! Paper.li multi-share - share your favorite articles, photos and videos from one paper in 1 go.
Life Advice From 18 of the Wealthiest People in History (Interactive Gra...
Wise words from Bill Gates, Mark Zuckerberg, Michael Dell, Elon Musk, Andrew Carnegie and other legendary leaders on success, attention, mot