Shared publicly  - 
 
Did Microsoft lose a decade because of its staff evaluation system
My 2 cents on this popular article

Microsoft's lost decade has to be a management issue: their staff is top notch, the company has key competitive advantages and loads of cash.
 
Vanity Fair's Eichenwald argues that "stack ranking", i.e. forcing every group at the company to identify a somewhat fixed proportion of low, medium and high performers, explains in large part MSFT's lost decade.  Stack ranking pitted team members against each other, hampering collaboration and crippling innovation.

It is worth noting that most large companies use "stack ranking" in some way, shape or form. It is widely perceived as a good practice, as it forces rigor in staff evaluation and development, and helps with "rating fairness" among other things.  But like every "good practice", it often fails to achieve its purpose because of poor execution.  

There are many questions to consider when thinking about implementing "stack ranking".  Here are 2 specifically focused on the "bottom" quota
 
1- Is my bottom quota aligned with my forced attrition expectations ?
If, as in the article, you expect 10% of low performers, you should be ready to ask 10% of your staff to leave.  This % is on top of voluntary departures (medium- to high- performing employees that will leave you anyway). If you are not actually following through and letting people go, why are you telling these low performers that they are bad .  In practice, except in high "forced attrition" industries, you are often much better off having a bottom quota <10%.

2- Am I enforcing the distribution at the right level ?  This is basic statistics:  if you expect 10% of your staff to be low performer on average , you need about 1,000 employees to be reasonably confident that the % of low performers will indeed be 10%.  In a group of 10 people, there is a very high likelihood (read one chance out of two ) that the right number of low performers is NOT 1, rather 0, 2 or more.  This problem is more complicated than it looks, as it is quite hard to stack rank 1,000 individuals (most likely managed directly by over 100 managers), but getting that piece right is key to useful stack ranking 

Back to MSFT, while I can easily see how stack ranking can go wrong, I believe that MSFT's lost decade's root cause has to be more than a poor performance evaluation system.  I would not be surprised if the key issue was a poor decision making/accountability system instead.

Let me know if you would like me to write more about Management in general or decision making and accountability in particular.

Source: extract on Vanity fair's website: http://www.vanityfair.com/online/daily/2012/07/microsoft-downfall-emails-steve-ballmer
(note: the full article will be in the August print edition)
20
4
Mike Keller's profile photoJohn Verdon's profile photoR Prakash Prakash's profile photoGregory Esau's profile photo
27 comments
 
We used forced rankings extensively at GE and it is an effective system if executed properly.  Yes it can exert some stress on lower performers, but that stress usually manifested either in improved performance or a mutual decision to seek a position inside or outside which was a better fit.  You have to be prepared to make the tough decisions and have the tough discussions if you are going to implement this process.
 
it is a given that a Company's most valuable asset is manpower. how to optimize this manpower calls for great management decisions. blunders are too costly and derailing. 
 
Thanks for this reply, you make a good point. The way I think about it is that an organization has a culture that is largely shaped by 'what does it take to get ahead'. On top of this is what you point out (I think) management culture. Sometimes success is the platform of future failure - e.g. Xerox not seeing the future of information processing in the mouse and GUI because they only focused on 'copies'. 

The problem is not the 10% you expect to lose - it's why did you hire people in the first place? The assumption of the 'bell curve' that automatically discounts that all the people you hire could be extraordinary. A cannibalistic or 'Klingon culture' assumes that progress can only happen in a 'dog-eat-dog' environment. But serendipity is the overwhelming -source of  innovation and what we need to leverage serendipity is more like trust and cooperation. For that you need an incentive system that includes group contribution not just individual performance. 

You also need some process that incentivizes management risk for experimentation and failure. MS could have produced a tablet just as Xerox could have produced a GUI computer interface (rather than showing the GUI to Apple and MS). 

All this to say, is that management decisioning and the HR evaluation system are intertwined.
 
While the boss expects everybody to perform at outstanding level, he goes with this kind of stacking, while evaluating performance. Since due to bad application, outstanding performers become obvious. This takes away the enthusiasm of top performers and average ones. This is cruel.
 
+John Verdon you are correct that the original hiring decision is the most important process. But even with the utmost attention, occasionally hiring managers make bad calls and bring weak players in. People also can lose energy and vitality for a variety of reasons over time. Giving people at the bottom the chance to be more successful somewhere else is important. 
 
If you have something new to say about "management" I would love to read it.  I often think that all "management talk" is a re-hash of already developed "management" gurus' work.
 
+Steve Wilkerson there are very few original ideas, just effective repackagings and recombinations of existing thoughts
 
+Jeffrey J Davis I couldn't agree with you more.  That's why really insightful commentary is so valuable.  If by "effective" you mean actually drawing attention to facets of existing thought that are both valuable and not obvious, that is exciting and fine.  My point is simply that I don't find much in all I read on the topic that meets the standard of originality or of "effective" repackaging and recombination.
 
I agree. Just a few years ago Steve Ballmer was saying how stupid touch screen phones were and how stupid the iPad is and how great Vista is.
 
I am with you +Steve Wilkerson amd +Jeffrey J Davis .  It is not about new ideas, rather repackaging of existing ones.  
One of my mentors, Yves Morieux, developed a deceptively simple set of management ideas at the cross road of sociology and management.  One great one is: put individuals in situations where they have to (1) resolve your conflicting objectives and (2) receive direct feedback on how well they do.  One good way to visualize that is cleaning planes: instead of a flight crew and a cleaning crew, have the flight crew clean the plane. This produces amazing results.
 
+Jeffrey J Davis If I think of the long term consequences of the stack ranking system, I can't see how it improves the collaborative or team functioning of an organization - in a team of a leagues 'All Stars' this system would demand that you get rid of the bottom 10% - despite the fact that that 10% is probably part of the league's top 10%.

Sure it creates incentives for individual performance - but I can't see the logic that would make such a system 'greater than the sum of it's parts'. It would simply be a great aggregation of individual's - each with an incentive to be ready to shove one of their own under the bus, to preserve their standing. High individual human capital but low social capital.
 
+John Verdon a key issue in large organizations is the feeling by employees that the system is not meritocratic, unfair, about who you know/ where you are instead of about your actual performance. Stack ranking to some extent addresses this, when implemented right.
Also as Jeffrey mentioned, the key with the all star comparison is to account for the fact that we don't know beforehand who the all stars are and whether that will change over time. If one follows the major league allegory, one needs to give players some certainty that they are assumed to be all stars for a while (eg. For a season) but then revisit whether the decision is still the right one.
 
+Antoine Carriere Agree that merit is fundamental to a sense of fairness, which in turn is fundamental to developing the trust necessary for an economic system or a collective effort that can harness what makes it greater than a sum of parts (rather than simply an aggregation of individuals - a mob). And you are right on the money - just because an organization can stack rank a bottom 10% doesn't mean that the bottom is unworthy. In order to know that the organization would have to compare its people with those from other organizations.
I'm not arguing that stack ranking won't let you develop excellent individuals - it will, what I'm arguing is that it doesn't let an organization develop excellent 'teams', or 'social capital'. There will always be the need to make tough decisions about getting rid of people who aren't making worthy contributions. But if an organization want to truly harness 'intrinsic motivation' the capacity of peope to give more of themselves for the collective project - stack ranking may have more negative consequences than positive ones. It seems to me that stack ranking essentially encourages more of the negative dimensions of 'selfishness' than the positive dimensions of self-actualization and inhibits the development of a more social or holistic effort.
So many of the metrics for performance are primarily on individual rather than group capabilities to the detriment of both the group and the inividual.
 
+John Verdon +Antoine Carriere I cannot offer any empirical evidence other than my own experience. In over 15 years practicing a pretty rigorous 9 block force ranking as part of our annual Session C process, I never saw Any detrimentAl impact on our team collaboration processes. I may be fully misguided, but I would like to think I was pretty clued in to the organizational currents. This isn't blind hatchet wielding, it's meritocracy.

Unacknowledged tumors are a huge detriment to team dynamics, especially when everyone can see the lump with the naked eye. 
 
From a guy who preaches and practices organizations of the future, and has written extensively about them for three years, I can say from my knowing +Jeffrey J Davis , that he "gets it". 
Every single system conceived of can be misinterpreted, mis-used, abused, etcetera. 
There is ultimately a "natural selection" to any system, and if people can make any one system work and prosper in any one (or multiple) environment, there is a 'proof in the pudding' metric in effect. 
 
Well as Ronald Coase, and Bill Joy both pointed out - the overwhelming majority of smart talented people will always outside the organization. The very concept of organization is changing. There is a lot to learn from the way Massive Multiplayer Online Games provide incentives and harness intrinsic motivation of all its participants. 

I agree that there is no system that can't be games, mis-used, abused at some time in some way. At heart the aim of performance evaluation has to be aligned with the culture, values and outcomes of an organization. I am sure there are many corporation whose values are completely consistent with a stack rank system.
 
There is another perhaps more profound problem with stack ranking - especially if it is an annual cycle or the time frame of the evaluation is the past year. 

Daniel Kahneman has rightly pointed out (see Thinking Fast and Slow) the difficulty that even experienced statisticians have in applying the very well known concept of 'regression to the mean'.

(quoted from the Wikipedia article - see it for the rest) "This is is the phenomenon that if a variable is extreme on its first measurement, it will tend to be closer to the average on a second measurement, and—a fact that may superficially seem paradoxical—if it is extreme on a second measurement, will tend to have been closer to the average on the first measurement. To avoid making wrong inferences, the possibility of regression toward the mean must be considered when designing experiments and interpreting experimental, survey, and other empirical data in the physical, life, behavioral and social sciences....."

So what this means is that top performers in one year are not indications of average performers and it is very likely that in another year top performers may come out on the bottom. 

If this is to be at all effective in attempts to determine excellence in consistent year after year performance than the evaluation time frame must incorporate a rolling multi-year performance statistic(s). If this is carried out annually - then evaluators really won't know what the performance is really based on - competencies or luck (or both of course).
 
+John Verdon I think "regression to the mean" is not highly applicable here.  

For regression to the mean to be applicable you need observations that are to a significant extent random.

That is true for example of investment funds performance.  Overperformance (relative to a reference index) a given year could be luck or skill (most likely a bit of both).  The fact that past relative fund performance is not a good indication of future fund performance leads to believe that luck is the key driver of over or under performance, not skill. That is why you get to a strong regression to the mean in this context.

You may argue that people performance assessment is greatly random.  Data shows the opposite actually: performance last time is a stronger predictor of performance next time.  This is because another bias is at play: our craving for internal consistency.  Think about it: if you rated someone highly last time, it is actually very likely that you will rate that person highly next time.  Doing the opposite is to a large extent admitting to yourself that your were wrong. 
 
Daniel Kahneman uses the example of professional golfers to explain regression to the and applies this to most performances of professional athletes. Even in the performance of the best experts in any field there are really good performances, really bad performances. These variations create an average performance. When a really good performance by one professional athlete is used to compared to a really bad performance of another athlete in order to predict the next performance - he points out that it is a mistake to think the 'really good' or the 'really bad' are indicative of the next performance - the next performance of both professionals will in probability regress to the mean.
 
+John Verdon I believe the confusion comes from the fact that your "mean" in the golfer's example is the average performance of the said golfer (e.g. his or her handicap).
The "mean" in stack ranking is the average of the performance of a given group of people.  Tiger Woods' really bad performance is still greatly above the performance of an average golfer.

Yes there is variation around your own mean, but the ranking of golfers is pretty stable, exactly because of that regression to the mean (i.e. tendency to perform at your average level).

And when you transpose to performance measurement in large corporations, the psychological bias that your manager has to reinforce his first impression make things even more stale :)
 
I suppose if work for large organizations is like athletic performance in a stable game with little change in rules. But what if the company's work changes, demands change, the life circumstances of the people change - stack-ranking employees in order to know which bottom 10% to get rid of creates a culture of sharks - oriented to not let the personal interfere with professional. This is a traditional value that is increasingly outmoded because it doesn't serve either the people or the company.

Now Valve's Handbook for New Employees - which is a wonderful frame for the creative and knowledge worker, outlines the use of 360 degree stack ranking for determining compensation rather than who gets fired. Maybe, I've misunderstood how Microsoft used it - but the article described a culture that did not seem one that harnessed the best from team work, but only the ambition of individuals.
 
Thanks for these additional pointers +John Verdon.

I agree with you that implementation is key.  It is really difficult to form a definitive opinion on "stack ranking" generally speaking.  One can more easily say that one implementation is good and another one is poor. 

I also agree with you that the promise of fairness is much harder to fulfill in the context of interactions not governed by a static set of rules and a single KPI.  Most sports only have one single KPI (the score) and a static set of rules that precisely describe what can and cannot be accomplished to score. 
 
It is a good conversation when participants can achieve a common understanding. Thank you for persisting with your valuable insights. :)
Add a comment...