Google published a paper today about the evolution of our networking infrastructure over the past decade. I've gotten to be quite intimate with these systems, thanks to spending much of that time working on our search and storage infrastructure, and the differences are profound, to say the least.
The article quotes +Amin Vahdat
as describing what life was like in the earlier era: "there were painful tradeoffs with careful data locality and placement of servers connected to the same top of rack switch versus correlated failures caused by a single switch failure." I cannot begin
to describe the pain in the ass that this was: our high-capacity search stack (the part I was responsible for) was by far the most aggressive user of the network, and each cluster's deployment had to be carefully planned on a per-rack basis, to manage our use of bandwidth within and across racks, and to optimize for survivability in case a single rack switch failed.
To give you a sense of the PITA: (this will make sense only to computer scientists) I wrote our standard implementation of simulated annealing... because I needed it for the software that would figure out which tasks to put on which machines.
And even with those amazing systems, we could knock clusters over in a moment. To do a full restart of a search cluster (reading the index from in-datacenter storage from scratch) in less than 48 hours required shutting down all other jobs in the datacenter, because it would use up all of the internal bandwidth.
If you ever wondered why search over giant corpora is a hard business...
Cross-datacenter networking remains a hard problem, even with these advances, because while long-haul bandwidth has grown tremendously over the years, storage capacity has grown even faster. This is why a good mental analogue for the design of planet-scale storage systems is freight logistics: even with 747's crossing the globe, warehouses are still much bigger.
I obviously can't tell all the stories, but these papers are a remarkable chance to see what the cutting edge of networking infrastructure actually looks like. Those who are interested in such matters, enjoy!