Googlebot's getting spots before his eyes

One of the issues with computers 'recognising' images from their pixelation (and translating that into indexable data) is the processing time and 'bandwidth' used.

To date, those developing image parsing (for want of a crude term) have been using programs that try to match all the pixels in an image to best-guess what it is they're seeing.

But +Research at Google has recognised that humans themselves don't try to take in a landscape all at once.

They fixate on an area of topical interest and build the picture around that bit by bit.

Applying a similar thesis to computing images dramatically reduces the time, without affecting accuracy, that it takes a program to 'see' a picture. By picking hotspots in the image, identifying them and making stored associations, the process is becoming more defined, accurate and, dare we say it, artificially intelligent.

As images form a greater feature of search - and an ever engaging element in social - decreasing the time it takes convolutional neural networks to process images is a major advance.

And, according to this paper looking into #GoogleDeepMind 's capacity for such, it's a step we're not far from landing.

Interesting stuff; h/t +Jan-Willem Bobbink 
A Glimpse into Computer Vision

Neural networks have recently had great success in significantly advancing the state of the art on challenging image classification and object detection datasets. However, this accuracy comes at a high computational cost both at training and testing time.

But what if one takes inspiration from how people recognize objects, by selectively focusing on the important parts of an image instead of processing an entire image at once? By ignoring irrelevant noisy features in an image, fewer pixels need to be processed, substantially reducing classification and detection complexity.

Last week, during #NIPS2014 (, Google DeepMind presented Recurrent Models of Visual Attention, a paper which describes an “attention-based task-driven visual processing” that is capable of extracting information from an image or video by adaptively selecting a sequence of smaller regions (glimpses), processing only selected regions at high resolution. 

Read the full paper at
Shared publiclyView activity