Profile cover photo
Profile photo
Nitin Borwankar
5,230 followers -
Data/math/science geek
Data/math/science geek

5,230 followers
About
Posts

Post has attachment

Post has attachment
Add a comment...

Post has attachment
Add a comment...

Post has attachment
Add a comment...

Post has attachment
thought this was pretty funny - a meerkat getting drowsy and falling asleep
Add a comment...

Post has attachment

Post has attachment

Post has shared content
Part of +Neal Ford 's new functional programming video series for O'Reilly
Add a comment...

Post has attachment
An interesting conversation has been evolving on Twitter about the antifragility of server systems, DevOps etc.
The link below is one thread but there are a number of other threads so please dont take the one below as the only one.

https://twitter.com/nntaleb/status/368820666767650816

 It was getting too long for Twitter, so I'm expanding on some thoughts here.

The desirable situation is similar to what happens in a human body, where weaker cells or bad cells die off leaving stronger ones.  This is an antifragile system where stressors make a system stronger.

The discussion is about analogies in the server and DevOps space.
Current  server systems are robust but not antifragile. Robust systems resist external force to a point but then collapse completely.

Antifragile systems adaptively get better with each external stressor.
The problem with applying this to server systems is twofold

a) There is no underlying statistical  model or underlying set of properties which admit of a statistical distribution where some part of the population may be considered "good" and the others "bad".  In fact in a server farm (whether in the cloud or not) all servers are started up with identical properties, intentionally.

It is therefore, IMHO, no way to assign a "goodness"  measure to a server and cause it to be "killed" off to improve the health of the larger  population of servers.

b) servers themselves do not adapt in the presence of external stress, in a way that changes their responses in future.  This is not how they are currently designed.  They are much more like bricks than human cells.

IMHO, there needs to be a layer in addition to existing server architecture that uses machine learning to do two things. But before I go there I want to say that the goal of this is not "finding bugs" as Netflix's ChaosMonkey does.

 It is to adaptively change parameters of the system so that with each stressor it becomes more resilient to future stressors. 

The problem IMHO is that we don't think of server populations as set of  properties that are random variables in the stochastic process sense, we think of each server as an identical copy of the other.  And a server is either up or down.  In our mental model there is no smooth transition where the server gracefully degrades, learns from the stress and the next time degrades less.  We want this second model where we think in terms of graceful degradation (smooth metric) rather than up/down (discrete metric)

The question is then threefold

a) What knowledge (attributes) do we want to extract, from the current  stressor, and from the current response, which might be useful in creating an adaptive loop.

b) what parameters in the server do we adjust after a stressor event, so as to degrade less in future.

c) how do we assign degrees of goodness(continuous metric) so that we can use certain servers preferentially over others .

Nicholas, Adrian,  et al - hope this makes sense.  Hard to do this on Twitter.

Nitin.

P.S.
I am not sure the "killing off bad cells" analogy holds in a cloud but may hold in a non virtualized hardware server farm where some servers may have worse hardware.  The virtualization in a cloud, IMHO, spreads "badness" in ways that might make the above discussion inapplicable due to the non-localizability of "badness" to a particular server even if said server is made to look more like a cell not a brick.
Add a comment...

Post has attachment
Leo, our 6 month old German Shepherd
Photo
Add a comment...
Wait while more posts are being loaded