Profile

Cover photo
Laurent Bossavit
892 followers|155,725 views
AboutPostsPhotosYouTube

Stream

Laurent Bossavit

Shared publicly  - 
 
But talented programmers DO exist!

Below is my reply to a reader of Leprechauns, who said they liked the book but thought I was in the wrong on "10x programmers" - they'd actually met one.

It would be silly to deny the existence of talent. And it would be just as silly to lump the world into such broad categories that we couldn't distinguish between concepts as widely separated as "talent" on the one hand, and "productivity" on the other.

Some people are talented. They approach their art with a style which is uniquely and recognizably theirs; part of the trace they leave upon the world is that their art is forever changed after them; everything that follows gets compared to what they did.

Some people are "productive", in the vulgar sense of there being many works attributed to them. (We may prefer the word "prolific" here.)

Some people are talented but not productive: Kubrick comes to mind. Some are productive, and can be called talented, but not everything they did shows the same talent: I'd put Woody Allen in that category. Few shine both long and bright.

There are programmers who are both talented in the above sense, and "productive" in the vulgar sense, that many works can be attributed to them. Fabien Bellard is one example. (Perhaps not all shine as bright as the talented people we can name in other arts, possibly because programming is yet only on its way to becoming a major art: few people study the works of Fabien Bellard in the same way that people study the works of Mozart. Few people, alas, study the work of any programmer - perhaps least of all programmers themselves.)

With all of the above I have no problem.

Where I start having a problem is when the above senses of "talented" or "productive" become lumped in with a second sense of "productive": the sense in which you can measure the productivity of industrial apparatus, or of industrial systems in whole or in part, as in the phrase "the productivity of a worker". We have to decide what we are talking about - industrial economics, or the works of creative individuals.

It would be silly to say that Kubrick is 10x or 2x or 0.5x the filmmaker that Allen is. This is not the sense of "productive" that lends itself to comparison on a numerical scale.

Every time someone points to a "study" supposedly supporting the concept of highly productive programmers, they turn out to be supporting a notion of measuring some equivalent of the number of lines of code written per unit time; that is, the narrowly economic sense of "productivity". This might be a valid construct, but it should not be lumped in together with the other sense in which some talented individuals are "productive" - that is, "prolific".

And lump them together is precisely what "10x programmer" discourse encourages doing. It presupposes that you can hire a talented programmer to work on what you want done, and they will turn out ten times the "amount of work" (fungible work, not individual works) than a run-of-the-mill programmer will.

This is silly, because these talented programmers, if you ask them to work on your thing, will tell you what Kubrick or Allen would have said if you'd asked them to produce a movie on commission. They would have told you, perhaps even politely, to stuff it.

Further, the "10x programmer" concept presupposes that the production of one can be compared to the production of another, on a single scale, in precisely the sense that Kubrick and Allen's works cannot be compared.

This is silly, because a program is not a bunch of lines of code cranked out, machine-like; it is a socio-technical object existing within a broader context. To be valuable it must be used, to be used it must be distributed, users somehow trained, and so on. You can no more numerically compare the contribution of different programmers to different programs that you can numerically compare Nicole Kidman's "productivity" in Eyes Wide Shut to Scarlett Johansson's in Scoop.

I hope this clarifies why I do not feel that acknowledging the existence of talented or prolific individuals is incompatible with my critique of the concept of "10x programmer", and the mythology that has grown around that concept.

I don't feel that dismantling that mythology belittles the work of talented programmers; my inclination would be to magnify that work - by highlighting their creative individuality.
8
1
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
Forecasting the Future of Employment

(Follow-up on https://plus.google.com/u/1/+LaurentBossavit/posts/is8vMdyXbuU)

So how would a superforecaster think about issues like the risk of job loss to computerisation?

The first thing I would do is fix the timeframe and pin down the exact meaning of the claim. The objective is to remove ambiguity, at the cost of accepting that the resulting question may no longer be exactly what we started with.

The Oxford study computes a .99 probability that "Telemarketers" will be replaced by automated technology. "That sounds plausible," you might be thinking. After all, robocalling is on the rise, and seemingly inexorable.

As we saw in the previous instalment, the first thing we need to do is specify a time frame. Instead of "the next decade or two", let's go with "by 2025". Instead of the vague "replaced by technology", let's stipulate that the question will be answered Yes if and only if a reliable source indicates a tenfold reduction in that part of the workforce.

(This is generous towards the Oxford study. Jobs disappear for reasons other than automation, such as going to cheaper countries, and we would count those as a win for the automation study. That is one of the ways in which we'll accept a slightly different question than we started out with for the sake of precision.)

We will even pin down what reliable source. Since the study relies on BLS statistics, we'll use the BLS page: http://www.bls.gov/oes/current/oes419041.htm

So, our revised claim is:

There is a 99% probability that by 2025, the BLS will report fewer than 23,452 people employed in the "Telemarketer" category.

What do we mean exactly by "99% probability"? It means that out of 100 times you expressed a judgement at this level of probability, you expect to be wrong exactly once.

Let's put it another way. If the BLS reports more than 23K telemarketers in the US in 2025, you will pay me $100. If the BLS reports fewer (or stops reporting the category altogether), I will pay you $1 and one cent. (All sums adjusted for inflation.)

Mathematically, the expected value of this bet is zero - if you are correct. If you are estimating the probability conservatively (rounding off from 99.9%, say) then this is a winning bet for you. If you are overestimating the probability, then this is a good bet for me.

Would you bet $100 to $1 that the number of "Insurance Underwriters", estimated by the Oxford study to be at 99% risk of being automated away, will go down to about ten thousand from today's count of 106,300 by 2025? The BLS itself projects an outlook of a 6% decrease by 2022; you would be betting against the BLS, which presumably knows what it's talking about.

A 99% probability in this kind of domain strikes me as waaaaay overconfident. There are about 10 of these in the Oxford study; this implies a 90% probability (.99 to the 10th power) that all 10 have been lost to automation by 2025. This means you should be willing to take a bet to pay me $10K if any of these jobs are still shown by the BLS to employ more than 10% of the current counts, and I'll pay you a round $1000 if and only if all of these jobs are gone:

- Data Entry Keyers
- Library Technicians
- New Accounts Clerks
- Photographic Process Workers and Processing Machine Operators
- Tax Preparers
- Cargo and Freight Agents
- Watch Repairers
- Insurance Underwriters
- Mathematical Technicians
- Sewers, Hand
- Title Examiners, Abstractors, and Searchers
- Telemarketers

Would you take that bet?

Would you still take that bet, assuming you said "yes" the first time around, after I told you that hand sewers ("Sewers, Hand") saw their numbers only halved between 2005 and 2015?

(If I were making an actual forecast, I'd certainly look at this kind of evolution - I'd take the assumption that the next decade is likely to be much like the past decade as a starting point, and adjust according to current information. It's kind of weird that the authors of the Oxford study didn't even mention, that I can see, this kind of simple cross-check against their algorithmic model.)

I'm willing to put my money where my mouth is, by the way. I'm not a gambler, or willing to keep track of a bunch of bets, so I'd only take the bet once... but I would take it to show I'm serious about this kind of thing.
Industry profile for this occupation: Top. Industries with the highest published employment and wages for this occupation are provided. For a list of all industries with employment in this occupation, see the Create Customized Tables function. Industries with the highest levels of employment in ...
1
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
Destroying the entire US economy

I've been trying to wrap my head around a concept in economics, and you should know that I've no background in economics. Like, at all. My econ teacher back in high school would have been unanimously voted Worst Teacher if we'd had an election; he was so bad we skipped his classes in total impunity.

Anyway, my question was: when you hear "X costs the economy N billions of dollars per year", what specifically do you take that to mean? X is variously given as "disengaged employees", "preventable heart disease", "software bugs", and so on. It's entirely possible that claims of that sort make sense for some X, and not for others.

Does it mean, for instance, "in the absence of X there would be Y $Bn more wealth to share around"? That doesn't quite compute for me, because (in some of the cases I gave, such as software bugs) those Y billions are salaries or fees paid out to people, so are in the economy.

Someone on Twitter suggested it means "people/companies/the gov't spend Y but it doesn't produce useful returns". But then what specifically is meant by not saying that, and saying instead it's "a cost to the economy"?

Alternately, can anyone provide an example of an X that was eliminated and we were able to measure the costs of X recovered to the economy?

Being who I am, my hunch was that "X costs the economy Y" is actually a snowclone, meaning "X is bad" for any value of Y, otherwise empirically meaningless. What you do when you find a snowclone is look for examples, and I was able to find plenty.

What I found was interesting. I tabulated the results in a spreadsheet. If you sum all "costs to the US economy" you get a number larger than the economy is to start with.

Of course there's no sensible reason to count down from the total, and subtract these costs. It's obvious that a bunch of these are counterfactual: "if we stopped X it would add Y dollars to the overall bottom line". But just as obviously that framing is far less potent, because it makes clear its estimate is counterfactual, speculative, uncertain; whereas a "cost" is implied to be a solid figure.

There's no sensible reason for calling these things "costs" either; I don't go around bemoaning the cost I suffered by not becoming President of France, and thus not being able to get paid $100K for speaking at a conference - a total "loss" to me, so far, of over $6M.

Anyway, if you happen to make up a number for what your favorite problem costs the economy, I've got a handy spreadsheet of items to compare it to. For instance, it's more urgent to address "routine weather variability" than software bugs. You can all relax about using PHP or whatever.

Here's the link, for your convenience or amusement.
Drive
Costs to the US economyFeuille 1 Wherein various" costs to the US economy"( as found via the helpful Google) are found to sum to more than the economy itself. Not included: temporary costs such as Iraq war, etc.; estimated future costs( e. g. climate change) Disclaimer: yes, this is rough work, and no, I' m not tak...
5
1
Amitai Schlair's profile photoLaurent Bossavit's profile photoBen Hyde's profile photo
9 comments
 
Here's another.  Sleep problems -
http://www.ncbi.nlm.nih.gov/books/NBK19958/#a2000f7efddd00177
looks to be in the same order as software bugs.
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
 
I received notification yesterday that two of my abstracts were accepted to the Toward a Science of Consciousness 2015​ conference:

* Sentient companions predicted and modeled into existence: explaining the tulpa phenomenon. Accepted as a contributed poster. http://kajsotala.fi/Papers/Tulpa.pdf

Takes a stab at trying to explain so-called "tulpas", or intentionally created imaginary friends, based on some of the things we know about the brain's cognitive architecture.

* Coalescing Minds and Personal Identity. Accepted as contributed paper; co-authored with Harri Valpola. http://kajsotala.fi/Papers/CoalescingPersonalIdentity.pdf

Summarizes our earlier paper, Coalescing Minds (2012), which argued that it would in principle not require enormous technological breakthroughs to connect two minds together and possibly even have them merge. Then says a few words about the personal identity implications. Due to the word limit, we could only briefly summarize those implications: will have to cover the details in the actual talk.
1 comment on original post
1
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
Check out the new Rationalists in Tech podcast - I was one of the first interviewees.
I'll appreciate feedback on a new podcast, Rationalists in Tech.  I'm interviewing founders, executives, CEOs, consultants, and other people in the tech sector, mostly software. Thanks to Laurent Bossavit, Daniel Reeves, an...
2
1
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
Old but good paper on abuse of p-values.
 
One of the advantages of reading old papers is you can find some hilarious insults. Here's one from Bakan, David, “The test of significance in psychological research,” Psychological Bulletin, Vol. 66 (1966), pp. 423-437:

"I playfully once conducted the following "experiment": Suppose, I said, that every coin has associated with it a "spirit"; and suppose, furthermore, that if the spirit is implored properly, the coin will veer head or tail as one requests of the spirit. I thus invoked the spirit to make the coin fall head. I threw it once, it came up head. I did it again, it came up head again. I did this six times, and got six heads. Under the null hypothesis the probability of occurrence of six heads is (1/2)^6 =.016, significant at the 2% level of significance. I have never repeated the experiment. But, then, the logic of the inference model does not really demand that I do! It may be objected that the coin, or my tossing, or even my observation was biased. But I submit that such things were in all likelihood not as involved in the result as corresponding things in most psychological research."

This is an even better burn than it looks because Bakan is also illustrating optional stopping (he would have broken off the flipping if he hadn't kept getting heads), which is routine among psychologists and makes his p-value incorrect; naturally, no one computes their p-value correctly to account for optional stopping...
3 comments on original post
1
Add a comment...
Have him in circles
892 people
Damien Thouvenin's profile photo
Felix Rüssel's profile photo
Mauricio Scheffer's profile photo
Laurent J.V. Dubois (ljvd)'s profile photo
Andres Baniqued Jr's profile photo
Alexis Monville's profile photo
françoise Jonasse's profile photo
Emily Bache's profile photo
SAJID HUSSAIN's profile photo

Laurent Bossavit

Shared publicly  - 
 
The Future of Employment?

Here's a study which has been making the rounds recently. It seems destined for Leprechaun status.

Its claimed bottom line: "47% of US employment is at risk of computerisation in the next two decades".

So, earlier today, when I saw yet another tweet about this without qualifying language or critical examination, I pushed back. This was met with almost a textbook case of the Leprechaun Objection: "It was the "best" [study] we saw, but would love to hear of better ones! Any references?"

The usual answer applies: there surely is a "best" study out there on the ecology of leprechauns. But leprechauns still don't exist.

Before I go into some specific criticism of the study - or more accurately, of how the abstract of the study, and hence the news, frame its conclusions - I would like you to pause and think for a few minutes about two questions.

First, what does it mean to you that a given job is "at risk of being computerised in the next few decades"?

Second, if it was up to you to measure the probability that a given job would be computerised in that time frame, how would you go about it?

The latter question is a matter of forecasting. This is the topic of Phil Tetlock's latest book, Superforecasting (warmly recommended). The "super" in the title refers to the fact that people can be trained, apparently, to be much better at forecasting (accurately assessing the probability of specific future events) than the rest of the population. Also, some personality traits seem to predict "super" forecasting skills.

I happened to be among the top 2% of the participants in Tetlock's studies, hence a "superforecaster". I mention all this to establish that I know a thing or two about forecasting, and spent some time thinking seriously about "probabilities" and what the word means.

Now, back to the Oxford study.

The "two decades" thing is largely made up. The study is about occupations that "are potentially automatable over some unspecified number of years". The text goes on to say "perhaps a decade or two", but only by way of illustrating this vague timeframe. It could also be a century or two.

This is one of the things I learned about forecasting - unless you're specifying a well-defined time frame, it's close to impossible to assess the accuracy of forecasts.

But now the meat of the thing - what they mean by "probability". It turns out that the study didn't measure probability at all. The study was based on a set of subjective, binary assignments by the researchers of whether a job was "computerisable". They coded these jobs as 0 (can't be computerised) or 1 (is certain to be computerised).

But the study also admits: "We thus acknowledge that it is by no means certain that a job is computerisable given our labelling." So... the study coded as 0 and 1 judgments that were both subjective and uncertain. (This breaks yet another tenet of good forecasting: there are no blacks and whites - no 0s and 1s - but only shades of gray.)

The study started from subjective assessments of the research team over a small sample of occupations, and asked whether these assessments correlated in any way with "official" characteristics of the jobs in question, such as the job's requirements for manual, cognitive or social skills.

The study authors call these characteristics "objective" because they were given to them by the Bureau of Labor Services. It would be more honest to say that these characteristics were also "subjective" - but at worst tainted by someone else's subjectivity.

What the study did apparently demonstrate (I might yet look more closely into that part, the math-heavy part) is that the subjective assessments correlated rather well with the job characteristics; that is, once you know how much a given job relies on manual, social and cognitive skills respectively, you can reasonably well predict whether the researchers will think it is computerisable.

To which my reaction is: "Well, d'oh!".

The term "computerisable" reflects the anxieties of the age. We have seen jobs disappear and others be created. We also have our subjective but socially informed notions of what jobs require what kinds of skills. It's interesting, but not overly surprising, that these two sets of prejudices match up with each other.

But really, we haven't learned much about "what will happen in the next decade or two". And the study's predictions are so vague that I forecast a very low probability that they will ever be properly tested. To a very high likelihood (take it from a superforecaster) they will remain empty punditry.
6
1
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
Musings on the Cone of Uncertainty

A comment on a previous post objects to my comparing the weather version of the "cone of uncertainty" to the well-known one in software development that I've attempted to debunk for a while: "As well-managed projects progress, the number of variables is reduced [...] weather is a poor analogy [because] weather systems are full of uncontrolled, poorly understood variables, without progression towards a 'finish point'".

Does the notion of a "well-managed project" have any kind of predictive validity? Or is it something we assess after the fact?

It's easy to observe an effort that has little residual uncertainty, for instance because it's shipped to production or has become a commercial success, then turn around and see in its past a "well-managed project" or a nicely shaped cone of uncertainty. But this might well be due to survival bias and selective attention.

The question I'm asking is, if we attempt to draw cones while a project is ongoing, are there any project characteristics that let us reliably anticipate seeing a steadily narrowing of the uncertainties?

Software projects also are "full of uncontrolled, poorly understood variables". We call them "people". Each of these people has his or her own "finish point" that they are striving for, and they're often poorly aligned.

Irrespective of how well the analogy with the weather holds up, the weather cone is at least drawn in the direction that makes more sense to me: with the experienced present as a known point and the uncertain future as a range of possibilities.

The way the Boehm-McConnell cone is drawn has narrative fallacy written all over it. Its future finish point is really someone's present, when the project is delivered successfully and they look back at what a wild ride it has been. It's always going to look like a cone because the further back into their own past they look, the harder it was back then to imagine reaching this particular "finish point".

Two years ago, I could not possibly have imagined that I would end up, today, working for the French government helping them transform project management practice towards Agile. Even a few months ago the prospect felt like a weird gamble. Yet here I am, doing that. My own sense of purpose dictates that I construct some kind of retrospective consistency: I must yield to the temptation of reinterpreting the past few years as inexorably leading up to that point in my life.

Tempting and even useful as that view is, it's still false. That's what the Cone feels like, to me.
2
1
Matthijs Holter's profile photoPerze Ababa's profile photo
2 comments
 
the funny thing about survival bias is that in most companies the ones that tend to "survive" long enough to tell the story are managers or stakeholders who probably have a 10000 foot view of the project when it happened.

In that particular case, both survival bias and selective attention are one and the same.

Nice post Laurent. 
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
The Myth of the Myth of the Myth of 10x

Alan, over at Tooth of the Weasel, has a blog on "The Myth of the Myth of 10x", defending the old idea of "10x programmers". (It's not recent, but new to me and I was recently pointed to it after Steve McConnell commented to say, basically, "Hell yeah.")

As anyone knows who knows me a bit, I don't think the 10x concept has any credibility. But I'm open to new data and reasoning on the topic.

Also, Alan's post gave me a good opportunity to write up a bit of old history, as I like to do, that few people are aware of. So even if you're down with the whole "someone's wrong on the Internet" thing, read on for that juicy tidbit at least.

Alan's reasoning, as far as I could tell, appears to be "the 10x concept is not a myth because I define it differently from the way it was defined in the studies that are claimed to support the 10x concept".

I don't think this works. What would work for me: Alan's describing someone he's actually met (or has reliable information about), who fits his definition of "someone whose aptitudes allow them to deliver significantly higher output and quality", and some explanation of why they are that way. That would still be anecdotal evidence, but better than no evidence at all.

In a blog post a few years back (http://www.construx.com/10x_Software_Development/Chief_Programmer_Team_Update/) Steve has described what some might call "the original 10x programmer", Harlan Mills. According to Steve, "Harlan Mills personally wrote 83,000 lines of production code in one year" on a project for the New York Times in the early 70s.

I think this qualifies Mills as a "very prolific" programmer. One issue with that descriptor is that, as Alan acknowledged, "prolific" isn't the same as "productive" (and it's one of the tragedies of our profession that we consistently fail to distinguish the two). We all know people who churn out reams of code that turns out to be worthless.

It turns out Mills was one of those people.

At least he was on the particular project Steve describes as "one of the most successful projects of its time". By the way, you don't have to claim that "all programmers are about the same" to make a counter claim to the 10x concept; you can for instance merely point out that if programmers are extremely inconsistent in their performance, that would explain the data in the 10x studies just as well.

Maybe Mills was a 10x on some other project, but my research suggests he wasn't a 10x in Alan's sense of "significantly higher output and quality" on the Times project.

Stuart Shapiro, in his 1997 article "Splitting the Difference", described the same project somewhat differently:

"As evidence, the authors pointed to the development of an information bank for the New York Times, a project characterized by high productivity and very low error rates. Questions were raised, however, concerning the extent to which the circumstances surrounding the project were in fact typical. Moreover, it seems the system eventually proved unsatisfactory and was replaced some years later by a less ambitious system."

Source: http://sunnyday.mit.edu/16.355/shapiro-history.pdf

Shapiro is quoting from a much, much older article that appeared in Datamation in May 1977, "Data for Rent" by Laton McCartney:

"Unfortunately for The Times, the IBM designed system didn't prove to be the answer either. 'They touted us on top down structured programming', says Gordon H. Runner, a VP with The Information Bank, 'but what they delivered was not what they promised.' When the FSD system proved unsatisfactory, the TImes got rid of its IBM 370/148 and brought in a 360/67 and a DEC PDP-11/70. Further, Runner and his staff designed a system that was less ambitious than its predecessor but feasible and less costly. [...] 'With the new approach we're not trying to bite off the state of the art,' Runner explains. 'We're trying to deliver a product.'"

(The PDF for the Datamation article isn't available online, but I'm happy to provide it upon request.)

I find it ironic and funny that "the original 10x programmer" left behind such a bitter taste in his customer's mouth. It reminds me of the ultimate fate of the Chrysler C3 project that was the poster boy for Extreme Programming.

Our profession has long been driven by fad and fashion, with its history written not by the beneficiaries or victims of the projects on which we try new approaches, but by the people most biased to paint those projects and approaches in a good light. Our only way out of this rut is to cultivate a habit of critical thinking.

(I've written a lot more about the 10x myth, and my reasoning for branding it a myth, in my book: https://leanpub.com/leprechauns - if you found the above informative, check it out for more of that.)
I first heard of “10x” software development through the writings of Steve McConnell. Code Complete remains one of my favorite books about writing good software (The Pragmatic Programmer, Writing So...
10
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
Can't we all just get along ? A fable on ISO 29119

In an alternate world, a different group of testers petitioned ISO first, and formed WG666.

A few years later, the ISO 29666 standard was published which, among other things, stipulated  that the testing of software was to be accomplished through a recently deceased member of the species gallus gallus domesticus being swept in a roughly circular motion over any physical embodiment of the software in question.

Although this approach, derided by one faction as "waving a dead chicken over it", was publicized in conference talks and articles over the span of a few years, most of the testing community remained poorly informed as to its advantages, owing to the fees legitimately charged by ISO to cover the high costs of developing the standard. (In this alternate world, unlike in the real world, the working group had decided to use their own procedures to test the standard itself, and thus a great many fowl went into making the standard.)

One PhD level thesis that had examined how software testing was actually done at a number of corporations, and noted some passing similarities with chicken waving, was widely quoted as providing empirical support for the effectiveness of the concepts in the standard. (Clearly this alternate world was not much removed from our own.)

Some testers refused to sign the Stop29666 petition, on the basis of "keeping an open mind to ALL approaches" in software testing.

Some further reproached their testing colleagues who did sign the petition, because their opposition to waving dead chickens could after all only be explained as a result knee-jerk political affiliation (rather than because, well, dead chickens).

And further insisted that the opposition of a vocal minority to the ISO29666 standard was damaging the testing community by "polarizing" it, and called for all sincere professionals to try to "relax a bit" and "get along with each other".

All of these judgments were wrong. Even though this was an alternate world, it was still one where dead chickens didn't help much with testing.

Back in the real world...

On either side of the "contentious rift" opened up by ISO 29119 are human beings. One of the things we tend to do is justify our own beliefs using different standards than we apply to other people - particularly when they disagree with us.

It's too easy to think, when you disagree with someone, "My beliefs are grounded in facts and observations, but your beliefs are only to score points with your social circle and feel good about yourself."

Various commenters on the ISO 29119 debate, both for and against, are guilty of this, some more egregiously than others. Appeals to "take the right attitude" or "just relax a bit" are transparent attempts at painting the opposition as partial and subjective. One might argue that the accusations of "rent seeking" leveled at the authors of the standard are of a similar kind, insofar as they frame the debate as a matter of intent (the authors of the standard, the argument goes, want to secure revenue through regulation rather than through providing superior service). However, the argument based on "rent seeking" is eminently more testable than one based on "not having the right attitude": there is, factually, such a thing as regulatory capture; there is such a thing as manipulation of the ISO processes for private gain, as became painfully apparent in the case of the Microsoft-backed ISO standard for Office XML.

The point of the above fable is to encourage anyone reading up on the debate to apply a "dead chicken test". Cross out anything that you read which does not refer to a verifiable fact; anything that speculates on someone's intent or frame of mind, or expresses motherhood-and-apple-pie sentiments such as "we would like everyone to get along".

Does the approach to testing outlined in the standard yield better results than waving dead chickens around? Does any testing approach demonstrably work better than dead chickens, and what yardstick is appropriate to you when answering that question? Anything that doesn't contribute to answering these, either at the scale of an individual tester or at broader scales (company-wide, industry-wide), you can safely ignore.

For instance, the article at this URL: http://xbosoft.com/iso-29119-useful/ boils down to the following:

 The standard "gives a starting point to add context and customization".

Pretty much everything else is a red herring, or a manifestly false statement. For instance, the claim that "all ISO standards explicitly state that they need to be tailored to the situation and organization". There is an ISO standard determining the paper sizes for A and B series paper; you can bet that this doesn't "state that it needs to be tailored". Or "usually standards are born from nebulous concepts that we need to try to understand better" - this is again a completely baseless generalization. Paper size isn't a nebulous concept, it is simply a matter of reaching agreement, even a somewhat arbitrary one, on something where the details don't matter. In software development, not only do the details do matter, they sometimes seem to be all that does.

Does the standard provide a useful starting point? I've actually read the document, dissected it, and from my perspective as an expert software developer but, technically, a newbie to the world of professional testing, I find it worse than useless. The parts on "dynamic testing processes" - the parts that touch on actually going on with the testing itself, as opposed to burying it under layers of managing or planning or documenting - are a thicket of confusing terminology. Where a simple notion of "test idea" would have sufficed, they introduce "test conditions", "test coverage items" and "test sets". The only purpose these appear to serve is to generate copious amounts of documentation, essentially for the purpose of management oversight.

If you are determined, there are ways of finding the actual text of the standard. It can be a matter of finding yourself in the right place. For instance, universities or large corporations that have subscribed to IEEE's digital library on an "all you can eat" basis. If you happen to be connected to the wireless network of one such institution, as I was while attending the CAST conference, you'll be able to download the documents at no charge.

Get the facts, judge on the facts, ignore as best you can whoever does otherwise.
9
3
Freddy Vega's profile photoLaurent Bossavit's profile photo
19 comments
 
When you're lawfully inside a university library, you can read any book on the shelves. There's nothing unethical about that, it's what a library is for.

If you'd rather the market decide, then you should oppose the standard, since it will skew the market, not to the companies delivering the most valued results, but to those with a say in how the standard is defined. That is in a nutshell the "regulatory capture" argument.

Finally, you're raising the attitude objection again; "upset" is a red herring. ISO can standardize paper sizes because nobody cares about differences of a fraction of an inch in a sheet of paper, and everyone benefits from agreeing upon a size. Procedural "consensus" among a technical committee may then happen to reflect a larger consensus among users of paper.

When ISO lets a small group of people (one with an obvious vested interest, at that) standardize, on behalf of an entire specialization in software development, on something which isn't proven better than waving dead chickens, it is abusing the commonsense meaning of "consensus". There is no well-established benefit accruing to the community from agreeing upon the particular contents of ISO 29119 as the "standard" way to test. The differences being investigated by practicing testers as they go about their jobs do matter, quite a bit.

The Humpty Dumpty Principle applies: ISO can define "consensus" to means whatever the hell it wants, the rest of us are free to demand common sense and consistency in how the word is used.

ISO boasts in its marketing materials that its standards "are based on global expert opinion" and that "comments from stakeholders are taken into account" and other such phrases. These imply that whatever the strictly technical meaning of "consensus" (within working groups, et cetera), ISO does intend the word to carry the same connotations as the everyday meaning.
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
 
People try mailing a number of unpackaged items through the US Postal Service, record the results. Some of my favorites:

> Football. Days to delivery, 6. Male postal carrier was talkative and asked recipient about the scores of various current games. Carrier noted that mail must be wrapped.

> Pair of new, expensive tennis shoes. Strapped together with duct tape. Days to delivery, 7. When shoes were picked up at station, laces were tied tightly together with difficult-to-remove knot. Clerk noted that mail must be wrapped. [...]

> Helium balloon. The balloon was attached to a weight. The address was written on the balloon with magic marker; no postage was affixed. Our operative argued strongly that he should be charged a negative postage and refunded the postal fees, because the transport airplane would actually be lighter as a result of our postal item. This line of reasoning merely received a laugh from the clerk. The balloon was refused; reasons given: transportation of helium, not wrapped. [...]

> Box of sand. Packaged in transparent plastic box to be visible to postal employees. Sent to give an impression of potentially hiding something. The plastic box had obviously been opened before delivery and then securely taped shut again. Delivery without comment at doorstep, 7 days. [...]

> Deer tibia. Our mailing specialist received many strange looks from both postal clerks and members of the public in line when he picked it up at the station, 9 days. The clerk put on rubber gloves before handling the bone, inquired if our researcher were a "cultist," and commented that mail must be wrapped. 
1 comment on original post
1
Add a comment...

Laurent Bossavit

Shared publicly  - 
 
Can we bury the NIST study once and for all now?

The NIST study concluded that "the impact of inadequate software testing infrastructure on the US economy was between 22.2 and 59.5 billion dollars".

As usual, people mention this figure as if it was undisputed fact (for instance, you can find it on a couple Wikipedia pages). It's a good bet that they haven't read the original document carefully and critically. If they had, they might have noticed some red flags in the "study" and would at the very least hedge by emphasizing that it is an estimate.

There are two important aspects to any estimate: precision and accuracy.

Precision is the size of the error bars around the estimate. "Between $50Bn and $70Bn" isn't at all the same as "somewhere between a few hundred million and a few hundred billion, with sixty billion being our best guess". With a narrow spread, it's much easier to justify investing some proportionate amount of money in attempting to solve the problem. If your uncertainty is large, there's a greater risk you'll be wasting money.

Accuracy is about whether we even have reason to believe that the estimate has landed anywhere near the "true" value. Are we over-estimating? Under-estimating? Giving an answer that doesn't have anything to do with the question being asked?

The NIST procedure, as I was able to reconstruct it, went something like this (I'm actually simplifying a bit):
- ask survey respondents the question "how much did minor bugs cost you last year"
- average this across all respondents
- divide total expense by number of employees at respondent, to get a "cost of bugs per employee"
- multiply cost of bug per employee by total employment in that sector, based on BLS employment data

(Except that to extrapolate the results of their financial services survey, instead of employees they scaled by "million dollars in transaction volume".)

Then they "normalized" all that again into a per employee cost for both automotive and financial sectors... and scaled it all up again to the entire economy, again by multiplying by X million employees.  Now, whatever one thinks of this procedure (I think the heterogenous scaling factors are at best bizarre), it can't escape the laws of physics.

Specifically, that any measurement is subject to uncertainties, including the measurements from "number of employees". And these uncertainties add up as you add together estimates, or multiply one estimate by another.

To get a grip on the uncertainties involved, I tried to replicate the work of the NIST authors: that is, I tried to reproduce their derivation of the final estimate based on survey responses and estimates from BLS.

For instance, about half of NIST's total estimate can be accounted for by the costs directly incurred in paying developers and testers; the other half by the cost to end users as a consequence of software bugs. These are two distinct estimates which are added up to get the final answer. The sub-estimates are further subdivided into estimates for the automotive sector and for the financial services sector (the two sectors that were surveyed), and subdivided again into estimates for the costs from "major errors" and "minor errors" and other categories, and so on.

I eventually gave up because after a few steps I just couldn't find any way to get their numbers to add up. (A link to the spreadsheet attached; readers are more than welcome to copy, check and improve upon my work.)

Though ultimately fruitless, insofar as I wasn't able to reproduce all the steps in the derivation of the final estimate, the exercise was worthwhile. I got quite familiar with their numbers, in the process of trying to understand their derivation. I learned new things.

For instance, the study breaks down costs incurred through bad testing into various categories, including major errors and minor errors.

Apparently, for "minor errors", and in the automotive sector, the average cost of one bug in that category was four million dollars.

(Yes, they seem to be claiming an average cost per bug of $4M. This from table 6-11. I'm actually hoping someone tells me I'm interpreting that wrong, it's such an embarrassingly absurd result.)

Also, whereas "major" errors cost 16 times as much as "minor errors" in small automotive companies, this reverses in large ones, with "minor errors" having a substantially higher cost than "major errors".

So someone who believes the $60Bn number would also have to believe some very counter-intuitive things - since these numbers are inputs to the overall estimate.

The alternative is to believe there are serious problems with the study. Which opens up the question of its accuracy. On that score, two major aspects in academic research tend to be sample size and methodology. NIST's research was survey-based.

How many people did NIST ask? Paragraph 6.2.2 informs us that "four developers completed substantial portions of the entire survey". Section 7 is a bit vaguer about how many people responded for the "developer" portion of the costs, but it looks as if the total sample size was less than 15, which seems like a direly inadequate basis on which to place half of the total estimate.

The surveys of end users seem to have had a more reasonable sample size: 179 respondents in the automotive sector and 98 in financial services. (However, it must be noted that the surveys had rather dismal response rates, 20% and 7% respectively.)

What did NIST ask? They asked for a few people's opinion of how much they spent on bugs when. The inputs to the model are quite literally educated guesses. One survey is about 40 questions long, and respondents were told that they could answer the survey in 25 minutes including time to research the data.

I would argue that most people have no idea how much bugs cost other than the "exponential rise" model which largely predates 2002. If you have less than a minute to answer a question about how bugs cost, you're probably going to reach for the answer you remember from school or that you read in articles.

So, this "survey" about the cost of bugs would predictably be largely self-fulfilling. You get the numbers you expect to get. The numbers' connection with reality is tenuous at best.

If you are quoting the $60 billion estimates, you are basically endorsing:
- odd findings such as a cost of $4M per minor error
- the idea that minor errors may cost more than major ones
- the statistical validity of unreasonably small sample sizes
- most problematically, the validity of opinion over actual measurement

Think about this before spreading the NIST numbers any further.

-------

If you liked this post, consider supporting me by buying my book: http://leanpub.com/leprechauns
8
4
Laurent Bossavit's profile photoDave Rooney's profile photo
3 comments
 
+Laurent Bossavit OK, cool. I responded without reading the underlying study. :)
Add a comment...
People
Have him in circles
892 people
Damien Thouvenin's profile photo
Felix Rüssel's profile photo
Mauricio Scheffer's profile photo
Laurent J.V. Dubois (ljvd)'s profile photo
Andres Baniqued Jr's profile photo
Alexis Monville's profile photo
françoise Jonasse's profile photo
Emily Bache's profile photo
SAJID HUSSAIN's profile photo
Links
Other profiles
Basic Information
Gender
Male