Shared publicly  - 
 
Amazon Glacier is slow data, cheap. That's cool, icy cool.

What if I told you that deferrable data access could be making the rest of Amazon's hard drives far faster?
http://www.quickmeme.com/meme/3ql803/

It may be a colossal massive gain in data-center performance & efficiency for the rest of Amazon, via an old technique revived with it's old weakness paved over via data-center scale sizing & elasticized coordination. What if Amazon Glacier's data is stored in the offline areas of Short Stroking drives?
http://en.wikipedia.org/wiki/Disk-drive_performance_characteristics#Short_stroking

One of the most remarkable things about hard drives is that they work much much better if you don't use all of them. That needle, helpfully shown below, that drive read/write head can read and write a lot more data if there is less distance up and down the disk to travel. Short stroking works by putting data in the densest spot, the outside. (Explanation: there is constant rotation velocity, implying two things: the linear read speed, amount of data covered in one rotation, will be fastest by the edge which has a bigger circumference, two, since each track holds more data by the edge (circumference again) moving up and down a track will have "seeked" a further distance in the same amount of up/down time meaning access times generally improve. Throughput & latency, both.).

The downside is that you're artificially restricting the drive to only use some part of itself. Whether it is top or bottom or middle of the disk, you're ignoring the other space on the drive to focus on reading one part.

Glacier allows (could allow) Amazon to
a) keep drives normally operating with stroked behavior
b) top down decide windows this drive does not need to run super fast.
There's long term "cold storage" on this drive in a place we'll never ever talk to or think about, glacier storage, except every so often we'll schedule some time when we know we're going to need to use the drive to go access that glacier partition. Since we have lots of computers, we can provision someone else to take over for us during this period of heavy usage.

Short-stroked spinning-rust drives + cold storage, a winning data-center combo.

I'm somewhat skeptical Amazon is actually this cool (has/is implementing this), but this is where supreme data center efficiency gains will be made, knowing your loads and tuning to them, in this case, realizing that most data doesn't get accessed very often and segmenting and tiering that data into a new deferrable service classification.

Doing it all all the time reliably with unknown loads is hard, is why data centers need to overprovision (as Amazon must) or have elastic resources on call (via giants like Amazon), & gives little ability to plan and stage and specialize. Operating in this steady state chaos mode is how we run servers today: sometimes we quarantine our analytics chaos from our production chaos with different clusters, but we've lost something from the past.

I like to think that in the future we'll head back to the origins of computing, with more coordination and control and sentience over where processes are being run, that we'll reconsider batch processing, have dynamic specialization for our nodes, not static assignments. It comes back to ambient computing, monitoring the systems runtime, allowing mobility of code & data, and the faculties to control and consideration for man and machine to decide where best each miniature world of computing ought be happening now, and where it ought be drawing it's resources from.
2
Add a comment...