Profile cover photo
Profile photo
Clement Farabet

Post has attachment
Today, after a tremendous year of development and iterations, we (Madbits) are excited to announce that we are joining Twitter.

Over this past year, we’ve built visual intelligence technology that automatically understands, organizes and extracts relevant information from raw media. Understanding the content of an image, whether or not there are tags associated with that image, is a complex challenge. We developed our technology based on deep learning, an approach to statistical machine learning that involves stacking simple projections to form powerful hierarchical models of a signal.

We prototyped and tested about ten different applications, and as we’ve prepared to launch publicly, we’ve decided to bring the technology to Twitter, a company that shares our ambitions and vision and will help us scale this technology.

We are excited to join the folks at Twitter to merge our efforts and see this technology grow to its full potential.

Clément Farabet, Louis-Alexandre Etezad-Heydari
& the MADBITS team
Add a comment...

Post has shared content
Google servers.
How many servers does Google have?

My estimate: 1,791,040 as of January 2012
And projection: 2,376,640 in early 2013

This estimate was made by adding up the total available floor space at all of Google's data centers, combined with knowledge on how the data centers are constructed. I've also checked the numbers against Google's known energy consumption, and various other snippets of detail revealed by Google themselves.

Satellite imagery:

Google doesn't publicly say how many servers they have. They keep the figure secret for competitive reasons. If Microsoft over-estimates and invests in more servers then they'll waste money - and this would be good for Google. Conversely, if Microsoft builds fewer servers then they won't match Google's processing power, and again, this would be good for Google. Nevertheless, from the limited amount of information that is available I've attempted to make a rough estimate.

First of all, here's some background on how Google's data centers are built and organised. Understanding this is crucial to making a good estimate.

Number and location of data centers

Google build and operate their own data centers. This wasn't always the case. In the early years they rented colocation space at third-party centers. Since the mid-2000s, however, they have been building their own. Google currently (as of January 2012) has eight operational data centers. There are six in the US and two in Europe. Two more are being built in Asia and one more in Europe. A twelfth is planned in Taiwan but construction hasn't yet received the go-ahead.

Initially the data center locations were kept secret. Google even purchased the land under a false company name. That approach didn't quite work however. Information always leaked out via the local communities. So now Google openly publishes the info:

Here are all 12 of Google's self-built data centers, listed by year they became operational:

2003 - Douglas County, Georgia, USA (container center 2005)
2006 - The Dalles, Oregon, USA
2008 - Lenoir, North Carolina, USA
2008 - Moncks Corner, South Carolina, USA
2008 - St. Ghislain, Belgium
2009 - Council Bluffs, Iowa, USA
2010 - Hamina, Finland
2011 - Mayes County, Oklahoma, USA

2012 - Profile Park, Dublin, Ireland (operational late 2012)
2013 - Jurong West, Singapore (operational early 2013)
2013 - Kowloon, Hong Kong (operational early 2013)
201? - Changhua Coastal Industrial Park, Taiwan (unconfirmed)

These are so-called “mega data centers” that contain hundreds of thousands of servers. It's possible that Google continues to rent smaller pockets of third-party colocation space, or has servers hidden away at Google offices around the world. There's online evidence, for example, that Google was still seeking colocation space as recently as 2008. Three of the mega data centers came online later that year, however, and that should have brought the total capacity up to requirements. It's reasonable to assume that Google now maintains all its servers exclusively at its own purpose-built centers - for reasons of security and operational efficiency.

Physical construction of data centers

Although the locations are public knowledge, the data center insides are still fairly secret. The public are not allowed in, there are no tours, and even Google employees have restricted access. Google have, however, revealed the general design principles.

The centers are based around mobile shipping containers. They use standard 40' intermodal containers which are ~12m long and ~2.5m wide. Each container holds 1,160 servers. The containers are lined up in rows inside a warehouse, and are stacked two high.

See the video Google released in 2009: Google container data center tour

Are all of Google's data centers now based on this container design? We don't know for sure, but assume that they are. It would be sensible to have a standardised system.

As for the servers themselves - they use cheap, low-performance, open-case machines. The machines only contain the minimal hardware required to do their job, namely: CPU, DRAM, disk, network adapter, and on-board battery-powered UPS. Exact up-to-date specifications are not known, but in 2009 an average server was thought to be a dual-core dual-processor (i.e. 4 cores) with 16 GB RAM and 2 TB disk.

The containers are rigged to an external power supply and cooling system. Much of the space inside a warehouse is taken up with the cooling pipes and pumps. The cooling towers are generally external structures adjacent to the warehouse.

Counting servers based on data center floor space

This is by no means a precise method, but it gives us an indication. It works as follows.

First we determine the surface area occupied by each of Google's data center buildings. Sometimes this information is published. For example the data center at The Dalles is reported to be 66,000 m². The problem with this figure, however, is we don't know if it includes only the warehouse building itself or the whole plot of land including supporting buildings, car parks, and flower beds.

So, to be sure of getting the exact size of only the buildings, I took satellite images from Google Maps and used those to make measurements. Due to out-of-date imagery some of the data centers are not shown on Google Maps, but those that are missing can be found on Bing Maps instead.

Having retrieved the satellite imagery of the buildings I then superimposed rows of shipping containers drawn to scale. Care was taken to ensure the containers occupied approximately the same proportion of total warehouse surface area as seen in the video linked above. That is, well under 50% of the floor space, probably closer to 20%. An example of this superimposed imagery is attached to this post, it shows one of the warehouses in Douglas County, Georgia, USA.

All floor plan images:

Having counted how many container footprints fit inside each warehouse, I then doubled those figures. This is because I assume all containers are stacked two high. Quite a large assumption, but hopefully a fair one.

It turns out that in general the centers house around 200,000 servers each. Douglas County is much larger at about twice that figure. Meanwhile Lenoir, Hamina, and Mayes County are smaller. Mayes County is due to be doubled in size during 2012. The sizes of the future data centers in Singapore and Hong Kong have not been measured. Instead I assume that they'll also host around 200,000 servers each.

This results in the following totals:

417,600 servers - Douglas County, Georgia, USA
204,160 servers - The Dalles, Oregon, USA
241,280 servers - Council Bluffs, Iowa, USA
139,200 servers - Lenoir, North Carolina, USA
250,560 servers - Moncks Corner, South Carolina, USA
296,960 servers - St. Ghislain, Belgium
116,000 servers - Hamina, Finland
125,280 servers - Mayes County, Oklahoma, USA

Sub-total: 1,791,040

Future data centers that'll be operational by early 2013:

46,400 servers - Profile Park, Dublin, Ireland
200,000 servers - Jurong West, Singapore (projected estimate)
200,000 servers - Kowloon, Hong Kong (projected estimate)
139,200 additional servers - Mayes County, Oklahoma, USA

Grand total: 2,376,640

Technical details revealed by Google

A slide show published in 2009 by Google Fellow +Jeff Dean reveals lots of interesting numbers. In particular it mentions "Spanner", which is the storage and computation system used to span all of Google's data centers. This system is designed to support 1 to 10 million globally distributed servers.

Given that this information was published over two years ago, it's likely the number of servers is already well into that 1-to-10 million range. And this would match with the floor space estimation.

Slide show:

Counting servers based on energy consumption

Last year +Jonathan Koomey published a study of data center electricity use from 2005 to 2010. He calculated that the total worldwide use in 2010 was 198.8 billion kWh. In May of 2011 he was told by +David Jacobowitz (program manager on the Green Energy team at Google) that Google's total data center electricity use was less than 1% of that worldwide figure.

From those numbers, Koomey calculated that Google was operating ~900,000 servers in 2010. He does say, however, that this is only "educated guesswork". He factored in an estimate that Google's servers are 30% more energy efficient than conventional ones. It‘s possible that this is an underestimate - Google does pride itself on energy efficiency.

If we take Koomey's 2010 figure of 900,000 servers, and then add the Hamina center (opened late 2010) and the Mayes County center (opened 2011) that brings us to over a million servers. The number would be ~1,200,000 if we were to assume all data centers are the same size.

Koomey's study:


The figure of 1,791,040 servers is an estimate. It's probably wrong. But hopefully not too wrong. I'm pretty confident it's correct within an order of magnitude. I can't imagine Google has fewer than 180,000 servers or more than 18 million. This gives an idea of the scale of the Google platform.


YouTube videos:
- Google container data center tour
- Google Data Center Efficiency Best Practices. Part 1 - Intro & Measuring PUE
- Continual improvements to Google data centers: ISO and OHSAS certifications
- Google data center security - Google patent for container-based data centers - Standard container sizes - +Jeff Dean's slideshow about Google platform design - “In the Plex” book by +Steven Levy - +Jonathan Koomey's data center electricity use

Articles by +Rich Miller of Data Center Knowledge:

Original copy of this post:

Attached image below is one of Google's data warehouses in Douglas County, Georgia. Photo is from Google Maps, with an overlay showing the server container locations.
Add a comment...

Post has shared content
Our new paper on scene parsing.
We have a new paper out on scene parsing:
Clément Farabet, Camille Couprie, Laurent Najman, Yann LeCun "Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers":

Scene parsing, or semantic segmentation, consists in labeling each pixel in an image with the category of the object it belongs to. It is a challenging task that involves the simultaneous detection, segmentation and recognition of all the objects in the image.
The scene parsing method proposed here starts by computing a tree of segments from a graph of pixel dissimilarities. Simultaneously, a set of dense feature vectors is computed which encodes regions of multiple sizes centered on each pixel. The feature extractor is a multiscale convolutional network trained from raw pixels. The feature vectors associated with the segments covered by each node in the tree are aggregated and fed to a classifier which produces an estimate of the distribution of object categories contained in the segment. A subset of tree nodes that cover the image are then selected so as to maximize the average "purity" of the class distributions, hence maximizing the overall likelihood that each segment will contain a single object. The convolutional network feature extractor is trained end-to-end from raw pixels, alleviating the need for engineered features. After training, the system is parameter free.
The system yields record accuracies on the Stanford Background Dataset (8 classes), the Sift Flow Dataset (33 classes) and the Barcelona Dataset (170 classes) while being an order of magnitude faster than competing approaches, producing a 320 \times 240 image labeling in less than 1 second.
Add a comment...

Post has shared content
Originally shared by ****
US Students drop out of science after they face the initial coursework: I blame educators. It is time to change curriculum and stop teaching dry math and physics when you can actually apply the material actively and teach a few concepts only and not a vast corpus that nobody ends up using. We need to teach for application and teach students to teach themselves when they need. We need more beginner programming and computer classes. Math should be taught there. We need more beginner fredhmen classes with circuits and embedded boards. Those are fun and enticing and teach you physics right away. We need more active mechanical eng classes building things. Then student will WANT to learn math and physics. I will write more on this and start the revolution. It time to get not 10000 engineers but 10 million. We can do it. I know how.

Comment to a Marco Scoffier post on the recent NYT article:
Add a comment...

Anyone felt this earthquake ?
Add a comment...

Post has attachment
Add a comment...
Wait while more posts are being loaded