Hadn't heard about this, an 11 hour Microsoft Azure cloud service outage that impacted just about everyone using it worldwide, including internal users like MSN.com and Xbox Live. Seems embarrassing since the root cause seems like it should have been preventable ("an infinite loop", perhaps between services, details not provided) and the duration of the outage was largely due to difficulty quickly rolling back the deployed code. I'm surprised this isn't getting more press coverage given the amount of damage it caused.
5 plus ones
Shared publicly•View activity
- Azure has had repeated reliability problems. So have Google and amazon bit at a lower rate. Cloud is really hard software to run well.Nov 20, 2014
- Ed Chi+1This stuff is very hard, and you need a ton of experience to get this right. Sigh...Nov 20, 2014
- Running cloud systems requires huge amounts of automation. When the automation is wrong, it systemically screws up everything. It's kind of amazing, actually, sometimes you do a push and it's kind of like the ending of http://downlode.org/Etext/nine_billion_names_of_god.html .Nov 20, 2014
- Ars has a couple more details: http://arstechnica.com/information-technology/2014/11/azure-went-down-and-people-actually-noticed/Nov 21, 2014
- And a 2 hour outage on AWS Cloudfront: http://www.geekwire.com/2014/amazons-cloudfront-hits-snag-causing-problems-across-web/Nov 30, 2014
Add a comment...