One of the biggest challenges in information retrieval (the branch of computer science that includes search and content recommendation) is how to find good content which humans haven't already found. To date, the most reliable signals have been other human judgments: for example, PageRank is a measure of how "good" a site is based on links people have made to that site (with the challenge being how to separate "meaningful" and trustworthy links from the rest), and collaborative filtering is based on what other users have chosen (with the challenge being how to find users with similar enough taste to be relevant).
The challenge is that, when new material shows up on the scene, you don't yet have any human interactions -- and quite often, good material, things people would love, simply goes unnoticed and never builds up the interaction signals which help. To detect quality in these things requires understanding the content itself, and the aspects of it which matter to people.
There are several hard aspects to this. One is simply understanding the content at the right granularity: "the color of the top-left pixel" or "the frequency of the word 'whenever'" are too fine-grained to give us a hint about whether people will like something, so we need to be able to group the content into more meaningful structures. For images, that might be "an image of a face in 3/4-profile," a certain color balance or contrast, a perspective or a cropping, and advances in image recognition in the past few years have (finally) made it possible to reliably identify such features. For text, it's much harder: there isn't yet even a clear idea of what features both could be measured about text and determine people's tastes. (How do you measure "intellectually meaty" or "hinting at scandal?")
This paper has used the recent advances in image processing, together with recent advances in AI in general, to get a sense of which pictures people will like. It started by taking several thousand images, and having them rated by humans for quality; that was used as "ground truth." Then, those thousands of images are analyzed into meaningful features, and a neural network is trained to find patterns of image features which predict human taste.
This is what neural networks, and other kinds of "supervised" machine learning systems, do in general: they take as inputs a bunch of signals, and combine them using a large number of parameters -- the "weights" -- to produce predictions of some values that you want to measure. The weights are set by taking a large number of test examples ("golden data" or "ground truth") with known values of both the signals and the test values; weights are chosen ("trained") to maximize the quality of the system's predictions for this data. To make sure that the training doesn't just teach it to recognize those specific examples, the golden data is randomly split into two groups; one is used for training, and then it's tested against the other group to make sure that the predictions with the trained weights are good. If they are, then you have a model which can predict -- given any set of measured signals -- the truth values.
In this case, the signals are these features of the image, measured by a second machine learning system; the quantity being predicted is whether people will like it. Because these are all "content-based signals" -- that is, they're based on the contents of the image, and not on people's responses to it -- the resulting model can be applied to any image.
The team then applied this model to a set of 9 million images from Flickr with fewer than five "favorites." They tested the quality of its picks by having human raters compare that result set with the set of popular images on Flickr; the result was excellent, with its "hidden gems" scoring statistically the same as the most popular images on the site.
I would expect a lot more work on related techniques over the next few years, and for this to have a significant impact on the way that content recommendation is done. The main upshot will be that more little-known works get the spotlight they deserve -- something critical, as more and more people are creating things of value that they want the world to see.
h/t +Wayne Radinsky
and +Daniel Estrada