Just read paperplanes. Web Operations 101 For Developershttp://www.paperplanes.de/2011/7/25/web_operations_101_for_developers.html
Thanks +John Barton
Creating resilient systems means expecting things to fail and to plan accordingly. There's an increasing expectation that services on the web should have utility properties eg., always available, cheap etc. Anything that causes a user to be frustrated with the system is a failure. As Mathias Meyer mentions in his article, the system should communicate appropriately back to the user the situation with the error, and if possible automatically recover.
However, the simplest approach to building resilient systems is to reduce the complexity within it. There's a tendency for development to accumulate complexity that develops no value. The first step should be to remove unnecessary components. What's not there can't fail.