Shared publicly  - 
Spanner: Google's Globally-Distributed Database

I and many others have been working for the last few years on building a large-scale storage system that can manage data across all of Google's datacenters.  This system underlies Google's advertising system, among other products.  We'll be presenting a paper describing the system (with 26 co-authors!) at OSDI 2012 next month.  We've now put up a web page with a link to the PDF of the final version of the paper.

Feedback is welcome, of course.

Here's the abstract of the paper:

Spanner is Google's scalable, multi-version, globally-distributed, and synchronously-replicated database. It is the first system to distribute data at global scale and support externally-consistent distributed transactions. This paper describes how Spanner is structured, its feature set, the rationale underlying various design decisions, and a novel time API that exposes clock uncertainty. This API and its implementation are critical to supporting external consistency and a variety of powerful features: non-blocking reads in the past, lock-free read-only transactions, and atomic schema changes, across all of Spanner.
Spanner: Google's Globally-Distributed Database James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, JJ Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter ...
박희준's profile photoCUBRID Database's profile photoNoel Yap's profile photoRicardo Pietrobon's profile photo
Wow, how time flies! When I joined Google, Spanner was still what I might call "the impossible dream", and now it's not only real and serving important work in production, but we're already talking about it externally. :)
even as I click on the link, I have a sad feeling that I will not understand all of it. Even worse, I will never get a chance of using anything like this at my current job. Personal limitations aside, this is why I like +Google+  They worked on something that was meant to solve their problem but once they had a solution they decided to share that. This is not something that Apple would ever do with their current mentality. 
Like Bigtable energized the NoSql revolution, I think that Spanner heralds a new revolution in data management. It embraces all the learning of global-scale data management, provides necessary transactions and raises the level of interaction right to where modern web apps need it. I look forward to the open source implementations that will implement this and power the web over the next several years. Well done, Jeff, Sanjay, Andrew and team!
Thank you for r&d keep it up thanks
Speaking of open source implementations, I am in for doing one in Java or Go. If there are more takers, let's get it started!!
Well done Google love the work you guys do.
Awesome to see the Spanner team open it's kimono.  It's an amazing design, and hopefully will kick off a revolution amongst the current NoSql platforms and convince more people to leave behind their monolithic databases and embrace a new, distributed future.
+Jeff Dean Thx for posting this interesting paper! TrueTime is a really nice API, hope it will get more common, as the leap-second smearing google uses.
Jeff Du
Can not live without Google....
Hi +Jeff Dean  wondering if spanner uses storage replication and de-duplication mechanisms that is the foundation of storage products by companies like Data Domain  and Quantum. I'm not aware of the internals but thinking that de-duplication and replication might just reduce network traffic and load on peer systems
+Amit Warke We have a variety of compression algorithms in place for storing the raw data on disk, and use some forms of compression for our on-the-wire formats, as well.  Higher level de-duplication can be done on top of Spanner.
Hi +Jeff Dean that sounds great. I hope the compression , de-compression doesn't come in the way of performance. but again your google, you guys consider all corner cases before using any techniques.  I'm aware of some companies using hashing mechanisms to perform de-duplication though. If two blocks of data get the same hash then just store one block and then figure out the inode mapping. I'll be reading the entire paper soon :) . On the OSDI website it is listed as Elmo and not Spanner :) . By the way big fan of you and Sanjay, read your stuff while I was studying in grad school. Hope to have the honor of bumping into you one day. Have a great weekend
Iam not a guru but is hash data?and inode infinity?
Could you give a quick overview on why you are going through all the trouble of trying to use (GPS-)walltime instead of other physics/relativity-based abstract clocks (ex: Lamport and extensions thereof) ?
Hello +Jeff Dean, I was confused by the name of Spanner and Elmo.       In OSDI, Elmo: Building a Globally Distributed, Highly Available Database will be presented. From the title and the authors, I guess these two names describe the same system. But, what is the difference?
+Jeff Dean I was jsut reading about this.. and abotu you on Wired. Amazing work! 
Would it please be possible to explain at a simpler level how the timing system can guarantee total global ordering in a distributed system? After reading the paper I'm not getting it. We've had NTP with network access to stratum 1 clocks for some time. What you quickly learn is it was never really accurate enough to be reliably useful in determining ordering. Hows is this different? Thanks for any help.
+Todd Hoff at least as far as i understand it's basically the same thing as the manual implementation with mysql master/slave replication lag. From the truetime API it's not about being exactly sure what is before/after without any delay but including the added time error tracking you can implement a cluster global after() and before(). But that also means your system will get slower as your synchronization starts to drift.
+Jeff Dean I've been waiting for this publication, it's fantastic work.  Can you comment on Section 5?  After a few readings I still don't quite understand how to map Paxos groups <==> directories <==> servers <==> "participants".    Now back to the lab at UW to see if we can cheaply build an atomic clock on a chip...
guo tie
what language is used to implement Spanner?
I understand that True Time masters is google's private NTP stratum 1 clocks in one data center,and True Time API get timestamp with time error track. Use timestamp and time error each node around world can   implement after() berfore() with waiting -- Lamport. Not use genneral NTP only for reduce time error.
I hope it will be available as a service for Compute Engine or App Engine.
Hi everybody
I went to the google meetup in NYC and I'm very excited about spanner. I'm in if anyone wants to start working on an open source project in java!
Add a comment...