Shared publicly  - 
 
How many servers does Google have?

My estimate: 1,791,040 as of January 2012
And projection: 2,376,640 in early 2013

This estimate was made by adding up the total available floor space at all of Google's data centers, combined with knowledge on how the data centers are constructed. I've also checked the numbers against Google's known energy consumption, and various other snippets of detail revealed by Google themselves.

Satellite imagery: j2p2.com/google-data-center-floor-plans

Google doesn't publicly say how many servers they have. They keep the figure secret for competitive reasons. If Microsoft over-estimates and invests in more servers then they'll waste money - and this would be good for Google. Conversely, if Microsoft builds fewer servers then they won't match Google's processing power, and again, this would be good for Google. Nevertheless, from the limited amount of information that is available I've attempted to make a rough estimate.

First of all, here's some background on how Google's data centers are built and organised. Understanding this is crucial to making a good estimate.

--------------------------------------------------
Number and location of data centers

Google build and operate their own data centers. This wasn't always the case. In the early years they rented colocation space at third-party centers. Since the mid-2000s, however, they have been building their own. Google currently (as of January 2012) has eight operational data centers. There are six in the US and two in Europe. Two more are being built in Asia and one more in Europe. A twelfth is planned in Taiwan but construction hasn't yet received the go-ahead.

Initially the data center locations were kept secret. Google even purchased the land under a false company name. That approach didn't quite work however. Information always leaked out via the local communities. So now Google openly publishes the info: google.com/about/datacenters/locations

Here are all 12 of Google's self-built data centers, listed by year they became operational:

2003 - Douglas County, Georgia, USA (container center 2005)
2006 - The Dalles, Oregon, USA
2008 - Lenoir, North Carolina, USA
2008 - Moncks Corner, South Carolina, USA
2008 - St. Ghislain, Belgium
2009 - Council Bluffs, Iowa, USA
2010 - Hamina, Finland
2011 - Mayes County, Oklahoma, USA

2012 - Profile Park, Dublin, Ireland (operational late 2012)
2013 - Jurong West, Singapore (operational early 2013)
2013 - Kowloon, Hong Kong (operational early 2013)
201? - Changhua Coastal Industrial Park, Taiwan (unconfirmed)

These are so-called “mega data centers” that contain hundreds of thousands of servers. It's possible that Google continues to rent smaller pockets of third-party colocation space, or has servers hidden away at Google offices around the world. There's online evidence, for example, that Google was still seeking colocation space as recently as 2008. Three of the mega data centers came online later that year, however, and that should have brought the total capacity up to requirements. It's reasonable to assume that Google now maintains all its servers exclusively at its own purpose-built centers - for reasons of security and operational efficiency.

--------------------------------------------------
Physical construction of data centers

Although the locations are public knowledge, the data center insides are still fairly secret. The public are not allowed in, there are no tours, and even Google employees have restricted access. Google have, however, revealed the general design principles.

The centers are based around mobile shipping containers. They use standard 40' intermodal containers which are ~12m long and ~2.5m wide. Each container holds 1,160 servers. The containers are lined up in rows inside a warehouse, and are stacked two high.

See the video Google released in 2009: Google container data center tour

Are all of Google's data centers now based on this container design? We don't know for sure, but assume that they are. It would be sensible to have a standardised system.

As for the servers themselves - they use cheap, low-performance, open-case machines. The machines only contain the minimal hardware required to do their job, namely: CPU, DRAM, disk, network adapter, and on-board battery-powered UPS. Exact up-to-date specifications are not known, but in 2009 an average server was thought to be a dual-core dual-processor (i.e. 4 cores) with 16 GB RAM and 2 TB disk.

The containers are rigged to an external power supply and cooling system. Much of the space inside a warehouse is taken up with the cooling pipes and pumps. The cooling towers are generally external structures adjacent to the warehouse.

--------------------------------------------------
Counting servers based on data center floor space

This is by no means a precise method, but it gives us an indication. It works as follows.

First we determine the surface area occupied by each of Google's data center buildings. Sometimes this information is published. For example the data center at The Dalles is reported to be 66,000 m². The problem with this figure, however, is we don't know if it includes only the warehouse building itself or the whole plot of land including supporting buildings, car parks, and flower beds.

So, to be sure of getting the exact size of only the buildings, I took satellite images from Google Maps and used those to make measurements. Due to out-of-date imagery some of the data centers are not shown on Google Maps, but those that are missing can be found on Bing Maps instead.

Having retrieved the satellite imagery of the buildings I then superimposed rows of shipping containers drawn to scale. Care was taken to ensure the containers occupied approximately the same proportion of total warehouse surface area as seen in the video linked above. That is, well under 50% of the floor space, probably closer to 20%. An example of this superimposed imagery is attached to this post, it shows one of the warehouses in Douglas County, Georgia, USA.

All floor plan images: j2p2.com/google-data-center-floor-plans

Having counted how many container footprints fit inside each warehouse, I then doubled those figures. This is because I assume all containers are stacked two high. Quite a large assumption, but hopefully a fair one.

It turns out that in general the centers house around 200,000 servers each. Douglas County is much larger at about twice that figure. Meanwhile Lenoir, Hamina, and Mayes County are smaller. Mayes County is due to be doubled in size during 2012. The sizes of the future data centers in Singapore and Hong Kong have not been measured. Instead I assume that they'll also host around 200,000 servers each.

This results in the following totals:

417,600 servers - Douglas County, Georgia, USA
204,160 servers - The Dalles, Oregon, USA
241,280 servers - Council Bluffs, Iowa, USA
139,200 servers - Lenoir, North Carolina, USA
250,560 servers - Moncks Corner, South Carolina, USA
296,960 servers - St. Ghislain, Belgium
116,000 servers - Hamina, Finland
125,280 servers - Mayes County, Oklahoma, USA

Sub-total: 1,791,040

Future data centers that'll be operational by early 2013:

46,400 servers - Profile Park, Dublin, Ireland
200,000 servers - Jurong West, Singapore (projected estimate)
200,000 servers - Kowloon, Hong Kong (projected estimate)
139,200 additional servers - Mayes County, Oklahoma, USA

Grand total: 2,376,640

--------------------------------------------------
Technical details revealed by Google

A slide show published in 2009 by Google Fellow +Jeff Dean reveals lots of interesting numbers. In particular it mentions "Spanner", which is the storage and computation system used to span all of Google's data centers. This system is designed to support 1 to 10 million globally distributed servers.

Given that this information was published over two years ago, it's likely the number of servers is already well into that 1-to-10 million range. And this would match with the floor space estimation.

Slide show: www.odbms.org/download/dean-keynote-ladis2009.pdf

--------------------------------------------------
Counting servers based on energy consumption

Last year +Jonathan Koomey published a study of data center electricity use from 2005 to 2010. He calculated that the total worldwide use in 2010 was 198.8 billion kWh. In May of 2011 he was told by +David Jacobowitz (program manager on the Green Energy team at Google) that Google's total data center electricity use was less than 1% of that worldwide figure.

From those numbers, Koomey calculated that Google was operating ~900,000 servers in 2010. He does say, however, that this is only "educated guesswork". He factored in an estimate that Google's servers are 30% more energy efficient than conventional ones. It‘s possible that this is an underestimate - Google does pride itself on energy efficiency.

If we take Koomey's 2010 figure of 900,000 servers, and then add the Hamina center (opened late 2010) and the Mayes County center (opened 2011) that brings us to over a million servers. The number would be ~1,200,000 if we were to assume all data centers are the same size.

Koomey's study: www.koomey.com/post/8323374335

--------------------------------------------------
Summary

The figure of 1,791,040 servers is an estimate. It's probably wrong. But hopefully not too wrong. I'm pretty confident it's correct within an order of magnitude. I can't imagine Google has fewer than 180,000 servers or more than 18 million. This gives an idea of the scale of the Google platform.

--------------------------------------------------
References

YouTube videos:
- Google container data center tour
- Google Data Center Efficiency Best Practices. Part 1 - Intro & Measuring PUE
- Continual improvements to Google data centers: ISO and OHSAS certifications
- Google data center security

http://www.google.com/about/datacenters/
http://www.j2p2.com/google-data-center-floor-plans/

http://goo.gl/vkjWu - Google patent for container-based data centers
http://goo.gl/G4aMK - Standard container sizes
http://goo.gl/rfPMa - +Jeff Dean's slideshow about Google platform design
http://goo.gl/DcjJB - “In the Plex” book by +Steven Levy
http://goo.gl/JYXbx - +Jonathan Koomey's data center electricity use

Articles by +Rich Miller of Data Center Knowledge:
- http://goo.gl/nfjvW
- http://goo.gl/K5MDW
- http://goo.gl/rGNy7

Original copy of this post:
https://plus.google.com/114250946512808775436/posts/VaQu9sNxJuY

Attached image below is one of Google's data warehouses in Douglas County, Georgia. Photo is from Google Maps, with an overlay showing the server container locations.
--
139
205
Anders Henke's profile photoChris Davies's profile photoLeah Davies's profile photoObby Tt's profile photo
26 comments
 
Apparently Google's plan is to have your CV stored in two data centers in the US, one in Europe, and one in Asia. Redundancy in case of hardware failure.
 
+James Pearn This is truly impressive! I've done little amateur research projects somewhat like this (including one on Google Search), but this is way better sleuthing work than anything I've achieved. These are the same kinds of methods used during the Cold War to analyze warheads for example. Good work!
 
Thanks, Jakke. This is indeed amateur research. I'm hoping a proper computer scientist, or maybe even an ex-Googler, will read it and point out some errors - so the estimate can be refined.
 
+James Pearn Ex-Googlers will I think be bound their NDA's, but the people to reach out to might actually be retired Cold War analysis types. Of course they have even tighter NDA's, but some of these methods are ancient and bound to be in the public domain already. At the very least they might be able say whether the research makes any sense at all... :-)
 
The data center in Belgium can be seen quite well with Bing images, if you're interested.
 
getting an idea of the impact I wonder what number and kind of security it takes.... I mean who needs twin towers or nuclear plants in the future to focuse on? Any information on that?
 
Hi +James Pearn thank you for your detailed investigation and the useful information you give the community.

Its good to understand a bit from the technologic platform we are using here. Honest respect to the Google team for giving people and business the chance to use their platforms...
 
This appeals to the Savant in me! Excellent work! Looking through your posts... I'm now following you.
 
Great post James. When it comes to the sever number I think the power method is a good way. I recently listed to a podcast from +Horace Dediu and +Randy Bias (The Critical Path #14), and they mentioned that for 25MW - 50MW you could have 200k to 500k servers. They also threw out the stat that up to 30% of google's servers could be down at anyone time, they've built in massive redundancy, so that they cheap parts that they use can fail. It's another mindset to the high availability high quality stable server architecture.
 
Thank you for answering one of the questions I've been wondering about for years.
 
It's a nice writeup and gives a few numbers. However, the amount of servers doesn't really give an impression. To me, it doesn't matter at all how many servers Google operates, as long as their efficiency is high.

Efficiency as in "number of servers operated by a single system administrator". A single system administrator can usually handle roughly from 20 to 30 different servers (from operating system to service, delegating hardware requests to onsite staff).
As soon as you stop playing the game of "everybody gets his individual piece of cake" and standardize both server hardware and operating software, you can multiply this number by a tenfold. Make the OS install a unified platform for your software developers - multiply this by a factor of 5 (less confusion about "how do you do this on that distribution", less trouble with "but on my testbox, there's a different version running with its own featureset"). And with the right amount of automation, this number again may be multiplied by a tenfold as well. The more you're proceeding, the more you do need tight coupling between software development and system administration for a successful living.

Efficiency is important for a company both to spend less money on staff, but for Google and others, it's crucial in order to handle that many systems. Efficiency also reflects the level of system administration. The difference between a time switch resetting some device every hour, some button-punching monkey and a team of wizards in their respective field. The difference between casual outages and really reliable services.


In my own experience, a few unlucky system administrators did barely get a ratio of 1:40 (lots of individual servers with "unique" installs, low level of automation, "compliance requirements" requiring intensive paperwork, lots of time-intensive support requests by users) while others do achieve up to >1:5000 (very strict application and OS stack, extremely high level of automation, proper unit tests both in system administration and software development, administrator's words do have a high standing during product design).
The higher this ratio is, the less is the perceived difference between system administrators and high-level software developers.
At a rate of 1:50, your admins are usually wasting a lot of time with routine jobs or spend much time with low-level support requests. At a rate of 1:300, most "everyday" routine jobs or recurring error handling have been automated, but still a lot of "minor" things are left to be done.

At a rate of 1:500, there's still room for improvement, but your admins start working at a much different level, some techniques may bedazzle non-senior software developers. Replacing admins by some button-punching monkey is obviously out of the question and it's becoming very hard to find competent admins. Your admins do need skills of experienced software developers and system administrators as well. Training less-experienced admins or developers to do their job properly is possible, but may easily take half a year.


Roughly at a ratio of >1:1000, that's where the fun begins. Your admins have seen at least one case where different onboard network adapters did have the same ("globally unique") MAC address, resulting in network trouble. And in the following year, your hardware inventory toolchain automatically checks for this error and rejects the new server for its duplicate MAC address, stopping similar network trouble in the cold.

Around 1:2000, your admins have successfully automated about every routine job, both received from and given training to your software developers, do spend some time for playing a round of MarioKart or Guitar Hero.

At that ratio of "servers per admin", your admins barely do see "routine" errors anymore. One of their most important skills becomes the analysis of errors almost unknown to casual users. However, the sheer amount of servers does make those very "rare" happen a few times a day.

Your admins do witness network switches broadcasting unicast traffic to completely wrong VLANs, but only when the ethernet frame has a certain size and CRC. Software writes checksummed data onto a RAID, but the RAID returns a 8k chunk of zeroes instead - under very specific and rare conditions. Your admins may see double failures multiple times a year (redundancy fails when you need it), and checksum algorithms not discovering errors. Your admins can tell apart minor revisions of NICs by their respective network errors. Your bug reports are being adressed by custom firmware. A software changelog may state "you have to be Google-sized to experience this bug" - and you've been bitten by that bug up to four times a day.

Your hardware admins have seen pink smoke coming out of a fibre channel switch or single occurances of hardware assembly errors like an onboard sub-D serial port being mounted in a way so you can't plug in a serial cable (http://w.sysiphus.de/pics/DSCF0002-large.JPG).
You've seen X-rays of backplanes and circuit boards done by your hardware vendor in order to investigate some very obscure issue you're not permitted to talk about.


And somewhere around the ratio of 1:4000, you do know when you may trust in your vendor's saying "you're the first one reporting this issue, we've never heard of this before".


According to http://www.catonmat.net/blog/wp-content/uploads/2008/11/thatcouldnthappentous.pdf - Google's ratio of servers per admins is >1:4000 - so they are operating at that level.Google is driven by engineering, so I guess it's more about 1-2%, maybe up to 3% of admins there. So when anticipated 2% of Google's 32467 employees (http://investor.google.com/earnings/2011/Q4_google_earnings.html) happen to be admins and you multiply this by the ratio of 1:4000, you do get close to 2,600,000 of servers. I guess we're both in the right range (I don't know their figures, too).
 
Awesome comment, +Anders Henke, thanks for that! I see the PDF you linked to was uploaded just over three years ago, so it's possible the admin:server ratio is now even lower. I mean, the trend has always been in that direction. As I understand it, the on-site staff are hardware technicians only, and all software admin is done entirely from Mountain View. As to how many admins they might have, I've absolutely no idea.
 
Back in 2008, Google operated 5 data centers - today, they're operating 8. in 2008, they had 20222 employees, today they're with 32467 employees.
So in the end, the ratio of "total employees per data center" did stay about the very same, which is some indication that Google "just" managed to keep up its high standards.

Of course, that ">1:4000" is likely to mean "some teams are running at 1:100, but are aiming at those 1:15000 other teams are achieving". In the end, the average and trending stays the same.

While there always is room for improvement, it's a tremendous job keeping up a high standard in a fast-moving business.
 
Am I the only one who has noticed how much this building/parking lot looks like a video card with a PCIe connector?
 
Adding 600k servers in 2012 would make Google - if they were a server vendor - the 4th biggest server vendor in worldwide unit marketshare. 1/4 the volume of HP, 1/3 the volume of Dell, 1/2 the volume of IBM (see here for a Gartner press estimate of unit market share in a recent quarter: http://www.gartner.com/it/page.jsp?id=1859415 ). Impressive.
 
this is just nutty:

"If Microsoft over-estimates and invests in more servers then they'll waste money - and this would be good for Google. Conversely, if Microsoft builds fewer servers then they won't match Google's processing power, and again, this would be good for Google."

Microsoft doesnt try to match Googles server count ... they deploy servers to match user/processing demand, like everyone else!

not to mention using reverse logic ... ie if Google has deployed more servers than Microsoft, they are wasting money, and if less then Google has less processing power ... both good for Microsoft!
 
so what is the performance of all Google servers ?(Tflops) ?
Obby Tt
 
The question should be: how much power google's servers consume per year?
 
+Tobby Buu Averaged out over the number of people who use Google services, I suspect the answer, statistically speaking , is 0W/year.
 
+Nasser Taghavi, to answer your question about the performance of all Google servers, I estimate it's around 40 petaflops. Details in my follow-up post here: goo.gl/cdJ4b
 
Looking at your picture, you've labelled 'server containers' on what is more likely to be the evaporative cooling units. The roof has two colors, you can see that the shadow line extends evenly, and, the computers, which are in containers, are inside the building.

Google container data center tour shows their first container data center.
 
There used to be videos on Google engineering tech talks on the 'Google Platform' which talked about the software that manages this massive information machine spread over several continents. Unfortunately it's been dissappeared for 'competitive' reasons you site above. It's fascinating stuff. I wonder if they still use this as the basic building block? http://i.i.com.com/cnwk.1d/i/bto/20090401/GoogleServerLarge.jpg
On-board gel cell UPS is genius.
 
Google makes their own servers too on site.