Degrees of intellectual dishonesty
In the previous post (https://plus.google.com/+LaurentBossavit/posts/DZyraS7bWTE), I said something along the lines of wanting to crawl into a hole when I encounter bullshit masquerading as empirical support for a claim, such as "defects cost more to fix the later you fix them".
It's a fair question to wonder why I should feel shame for my profession. It's a fair question who I feel ashamed for. So let's drill a little deeper, and dig into cases.
Before we do that, a disclaimer: I am not in the habit of judging people. In what follows, I only mean to condemn behaviours. Also, I gathered most of the examples by random selection from the larger results of a Google search. I'm not picking on anyone in particular.
The originator of this most recent Leprechaun is Roger S Pressman, author of the 1987 book "Software Engineering: a Practitioner's Approach", now in its 8th edition and being sold as "the world's leading textbook in software engineering".
Here is in extenso the relevant passage (I quote from the 5th edition, but have no reason to think it changed in any way from the 1st.)
To illustrate the cost impact of early error detection, we consider a series of relative costs that are based on actual cost data collected for large software projects [IBM81]. Assume that an error uncovered during design will cost 1.0 monetary unit to correct. Relative to this cost, the same error uncovered just before testing commences will cost 6.5 units; during testing, 15 units; and after release, between 60 and 100 units.
This [IBM81] is expanded, in the References section of the book, into a citation: “Implementing Software Inspections,” course notes, IBM Systems Sciences Institute, IBM Corporation, 1981.
Am I embarrassed for Pressman, that is, do I think he's being intellectually dishonest? Yes, but at worst mildly so.
It's bothersome that for the first edition Pressman had no better source to point to than "course notes" - that is, material presented in a commercial training course, and as such not part of the "constitutive forum" of the software engineering discipline.
We can't be very harsh on 1987-Pressman, as software engineering was back then a discipline in its infancy; but it becomes increasingly problematic as edition after edition of this "bible" lets the claim stand without increasing the quality of the backing.
Moving on, consider this 1995 article: http://sci-hub.cc/10.1007/BF00402646
"Costs and benefits of early defect detection: experiences from developing client server and host applications", Van Megen et al.
This article doesn't refer to the cost increase factors. It says only this:
"To analyse the costs of early and late defect removal one has to consider the meaning and effect of late detection. IBM developed a defect amplification model (IBM, 1981)."
The citation is as follows:
"IBM (1981) Implementing Software Inspections, course notes (IBM Systems Sciences Institute, IBM Corporation) (summarised in Pressman 1992.)"
This is the exact same citation as Pressman's, with the added "back link" to the intermediate source. The "chain of data custody" is intact. I give Van Megen et al. a complete pass as far as their use of Pressman is concerned.
Let's look at a blog post by my colleague Johanna Rothman: http://www.jrothman.com/articles/2000/10/what-does-it-cost-you-to-fix-a-defect-and-why-should-you-care/
Johanna refers, quite honestly, to "hypothetical examples". This means "I made up this data", and she's being up front about it. She says:
"According to Pressman, the expected cost to fix defects increases during the product’s lifecycle. [...] even though the cost ratios don’t match the generally accepted ratios according to Pressman, one trend is clear: The later in the project you fix the defects, the more it costs to fix the defects."
I'm almost totally OK with that. It bothers me a bit that one would say "one trend is clear" about data that was just made up; we could have made the trend go the other way, too. But the article is fairly clear that we are looking at a hypothetical example based on data that only has a "theoretical" basis.
The citation:
Pressman, Roger S., Software Engineering, A Practitioner’s Approach, 3rd Edition, McGraw Hill, New York, 1992. p.559.
This is fine. It's a complete citation with page number, still rather easy to check.
I am starting to feel queasy with this 2007 StickyMinds article by Joe Marasco: https://www.stickyminds.com/article/what-cost-requirement-error
"The cost to fix a software defect varies according to how far along you are in the cycle, according to authors Roger S. Pressman and Robert B. Grady. These costs are presented in a relative manner, as shown in figure 1."
What Grady? Who's that? Exactly what work is being cited here? There's no way to tell, because no citation is given. Also, the data is presented as fact, and a chart, "Figure 1" is provided which was not present in the original.
This is shady. Not quite outright dishonest, but I'd be hard pressed to describe it more generously than as "inaccurate and misleading".
A different kind of shady is this paper by April Ritscher at Microsoft. http://www.uploads.pnsqc.org/2010/papers/Ritscher_Incorporating_User_Scenarios_in_Test_Design.pdf
The problem here is a (relatively mild) case of plagiarism. The words "the cost to fix software defects varies according to how far along you are in the cycle" are lifted straight from the Marasco article, with the "according to" clause in a different order. But the article doesn't give Marasco credit for those words.
There's also the distinct possibility that Ritscher never actually read "Pressman and Grady". Do I have proof of that? No, but it is a theorem of sorts that you can figure out the lineage of texts by "commonality of error". If you copy an accurate citation without having read the original, nobody's the wiser. But why would you go to the trouble of reproducing the same mistake that some random person made if you had actually read the original source?
So we're entering the domain of intellectual laziness here. (Again, to stave off the Fundamental Attribution Error: I am not calling the person intellectually lazy; I am judging the behaviour. The most industrious among us get intellectually lazy on occasion, that's why the profession of tester exists.)
Next is this 2008 article by Mukesh Soni: https://www.isixsigma.com/industries/software-it/defect-prevention-reducing-costs-and-enhancing-quality/
"The Systems Sciences Institute at IBM has reported that the cost to fix an error found after product release was four to five times as much as one uncovered during design, and up to 100 times more than one identified in the maintenance phase (Figure 1)."
We find the same level of deceit in a 2008 thesis, "A Model and Implementation of a Security Plug-in for the Software Life Cycle " by Shanai Ardi. http://www.diva-portal.org/smash/get/diva2:17553/FULLTEXT01.pdf
"According to IBM Systems Science Institute, fixing software defects in the testing and maintenance phases of software development increases the cost by factors of 15 and 60, respectively, compared to the cost of fixing them during design phase [50]."
The citation is missing, but that's not really what's important here. We've crossed over into the land of bullshit. Both authors presumably found the claim in the same place everyone else found it: Pressman. (If you're tempted to argue "they might have found it somewhere else", you're forgetting my earlier point about "commonality of error". The only thing the "IBM Systems Science Institute" is known for is Pressman quoting them; it was a training outfit that stopped doing business under that name in the late 1970's.)
But instead of attributing the claim to "IBM, as summarized by Pressman", which is only drawing attention to the weakness of the chain of data custody in the first place, it sounds a lot more authoritative to delete the middle link.
I could go on and on, so instead I'll stop at one which I think takes the cake: "ZDLC for the Early Stages of the Software Development Life Cycle", 2014: http://sci-hub.cc/10.1109/DCABES.2014.5#
"In 2001, Boehm and Basili claimed that the cost of fixing a software defect in a production environment can be as high as 100 times the cost of fixing the same defect in the requirements phase. In 2009, researchers at the IBM Systems Science Institute state that the ratio is more likely to be 200 to 1 [7], as shown in Figure 2".
The entire sentence starting "In 2009" is a layer cake of fabrication upon mendacity upon affabulation, but it gets worse with the citation.
Citation [7] is this: "Reducing rework through effective requirements management", a 2009 white paper from IBM Rational. Available here: http://www.edn.com/Pdf/ViewPdf?contentItemId=4210043
Yes, at the century scale IBM Rational is a contemporary with the defunct IBM Systems Science Institute, but that's a little like attributing a Victor Hugo quote to Napoleon.
While Figure 2 comes straight out of the IBM paper, the reference to "IBM Systems Science Institute" comes out of thin air. And in any case the data does not come from "researchers at IBM", since the IBM paper attributes the data to Boehm and Papaccio's classic paper "Understanding and Controlling Software Costs", which was published not in 2009 but in 1988. (Both of them worked at Defense consultancy TRW.)
We've left mere "bullshit" some miles behind here. This isn't a blog post, this an official peer reviewed conference with proceedings published by the IEEE, and yet right on the first page we run into stuff that a competent reviewer would have red-flagged several times. (I'm glad I've let my IEEE membership lapse a while ago.)
Garden-variety plagiarism and bullshit (of which we are not in short supply) make me feel icky about being associated with "software engineering", but I want to distance myself from that last kind of stuff as strongly as I possibly can. I cannot be content to merely ignore academic software engineering, as most software developers do anyway; I believe I have an active duty to disavow it.
In the previous post (https://plus.google.com/+LaurentBossavit/posts/DZyraS7bWTE), I said something along the lines of wanting to crawl into a hole when I encounter bullshit masquerading as empirical support for a claim, such as "defects cost more to fix the later you fix them".
It's a fair question to wonder why I should feel shame for my profession. It's a fair question who I feel ashamed for. So let's drill a little deeper, and dig into cases.
Before we do that, a disclaimer: I am not in the habit of judging people. In what follows, I only mean to condemn behaviours. Also, I gathered most of the examples by random selection from the larger results of a Google search. I'm not picking on anyone in particular.
The originator of this most recent Leprechaun is Roger S Pressman, author of the 1987 book "Software Engineering: a Practitioner's Approach", now in its 8th edition and being sold as "the world's leading textbook in software engineering".
Here is in extenso the relevant passage (I quote from the 5th edition, but have no reason to think it changed in any way from the 1st.)
To illustrate the cost impact of early error detection, we consider a series of relative costs that are based on actual cost data collected for large software projects [IBM81]. Assume that an error uncovered during design will cost 1.0 monetary unit to correct. Relative to this cost, the same error uncovered just before testing commences will cost 6.5 units; during testing, 15 units; and after release, between 60 and 100 units.
This [IBM81] is expanded, in the References section of the book, into a citation: “Implementing Software Inspections,” course notes, IBM Systems Sciences Institute, IBM Corporation, 1981.
Am I embarrassed for Pressman, that is, do I think he's being intellectually dishonest? Yes, but at worst mildly so.
It's bothersome that for the first edition Pressman had no better source to point to than "course notes" - that is, material presented in a commercial training course, and as such not part of the "constitutive forum" of the software engineering discipline.
We can't be very harsh on 1987-Pressman, as software engineering was back then a discipline in its infancy; but it becomes increasingly problematic as edition after edition of this "bible" lets the claim stand without increasing the quality of the backing.
Moving on, consider this 1995 article: http://sci-hub.cc/10.1007/BF00402646
"Costs and benefits of early defect detection: experiences from developing client server and host applications", Van Megen et al.
This article doesn't refer to the cost increase factors. It says only this:
"To analyse the costs of early and late defect removal one has to consider the meaning and effect of late detection. IBM developed a defect amplification model (IBM, 1981)."
The citation is as follows:
"IBM (1981) Implementing Software Inspections, course notes (IBM Systems Sciences Institute, IBM Corporation) (summarised in Pressman 1992.)"
This is the exact same citation as Pressman's, with the added "back link" to the intermediate source. The "chain of data custody" is intact. I give Van Megen et al. a complete pass as far as their use of Pressman is concerned.
Let's look at a blog post by my colleague Johanna Rothman: http://www.jrothman.com/articles/2000/10/what-does-it-cost-you-to-fix-a-defect-and-why-should-you-care/
Johanna refers, quite honestly, to "hypothetical examples". This means "I made up this data", and she's being up front about it. She says:
"According to Pressman, the expected cost to fix defects increases during the product’s lifecycle. [...] even though the cost ratios don’t match the generally accepted ratios according to Pressman, one trend is clear: The later in the project you fix the defects, the more it costs to fix the defects."
I'm almost totally OK with that. It bothers me a bit that one would say "one trend is clear" about data that was just made up; we could have made the trend go the other way, too. But the article is fairly clear that we are looking at a hypothetical example based on data that only has a "theoretical" basis.
The citation:
Pressman, Roger S., Software Engineering, A Practitioner’s Approach, 3rd Edition, McGraw Hill, New York, 1992. p.559.
This is fine. It's a complete citation with page number, still rather easy to check.
I am starting to feel queasy with this 2007 StickyMinds article by Joe Marasco: https://www.stickyminds.com/article/what-cost-requirement-error
"The cost to fix a software defect varies according to how far along you are in the cycle, according to authors Roger S. Pressman and Robert B. Grady. These costs are presented in a relative manner, as shown in figure 1."
What Grady? Who's that? Exactly what work is being cited here? There's no way to tell, because no citation is given. Also, the data is presented as fact, and a chart, "Figure 1" is provided which was not present in the original.
This is shady. Not quite outright dishonest, but I'd be hard pressed to describe it more generously than as "inaccurate and misleading".
A different kind of shady is this paper by April Ritscher at Microsoft. http://www.uploads.pnsqc.org/2010/papers/Ritscher_Incorporating_User_Scenarios_in_Test_Design.pdf
The problem here is a (relatively mild) case of plagiarism. The words "the cost to fix software defects varies according to how far along you are in the cycle" are lifted straight from the Marasco article, with the "according to" clause in a different order. But the article doesn't give Marasco credit for those words.
There's also the distinct possibility that Ritscher never actually read "Pressman and Grady". Do I have proof of that? No, but it is a theorem of sorts that you can figure out the lineage of texts by "commonality of error". If you copy an accurate citation without having read the original, nobody's the wiser. But why would you go to the trouble of reproducing the same mistake that some random person made if you had actually read the original source?
So we're entering the domain of intellectual laziness here. (Again, to stave off the Fundamental Attribution Error: I am not calling the person intellectually lazy; I am judging the behaviour. The most industrious among us get intellectually lazy on occasion, that's why the profession of tester exists.)
Next is this 2008 article by Mukesh Soni: https://www.isixsigma.com/industries/software-it/defect-prevention-reducing-costs-and-enhancing-quality/
"The Systems Sciences Institute at IBM has reported that the cost to fix an error found after product release was four to five times as much as one uncovered during design, and up to 100 times more than one identified in the maintenance phase (Figure 1)."
We find the same level of deceit in a 2008 thesis, "A Model and Implementation of a Security Plug-in for the Software Life Cycle " by Shanai Ardi. http://www.diva-portal.org/smash/get/diva2:17553/FULLTEXT01.pdf
"According to IBM Systems Science Institute, fixing software defects in the testing and maintenance phases of software development increases the cost by factors of 15 and 60, respectively, compared to the cost of fixing them during design phase [50]."
The citation is missing, but that's not really what's important here. We've crossed over into the land of bullshit. Both authors presumably found the claim in the same place everyone else found it: Pressman. (If you're tempted to argue "they might have found it somewhere else", you're forgetting my earlier point about "commonality of error". The only thing the "IBM Systems Science Institute" is known for is Pressman quoting them; it was a training outfit that stopped doing business under that name in the late 1970's.)
But instead of attributing the claim to "IBM, as summarized by Pressman", which is only drawing attention to the weakness of the chain of data custody in the first place, it sounds a lot more authoritative to delete the middle link.
I could go on and on, so instead I'll stop at one which I think takes the cake: "ZDLC for the Early Stages of the Software Development Life Cycle", 2014: http://sci-hub.cc/10.1109/DCABES.2014.5#
"In 2001, Boehm and Basili claimed that the cost of fixing a software defect in a production environment can be as high as 100 times the cost of fixing the same defect in the requirements phase. In 2009, researchers at the IBM Systems Science Institute state that the ratio is more likely to be 200 to 1 [7], as shown in Figure 2".
The entire sentence starting "In 2009" is a layer cake of fabrication upon mendacity upon affabulation, but it gets worse with the citation.
Citation [7] is this: "Reducing rework through effective requirements management", a 2009 white paper from IBM Rational. Available here: http://www.edn.com/Pdf/ViewPdf?contentItemId=4210043
Yes, at the century scale IBM Rational is a contemporary with the defunct IBM Systems Science Institute, but that's a little like attributing a Victor Hugo quote to Napoleon.
While Figure 2 comes straight out of the IBM paper, the reference to "IBM Systems Science Institute" comes out of thin air. And in any case the data does not come from "researchers at IBM", since the IBM paper attributes the data to Boehm and Papaccio's classic paper "Understanding and Controlling Software Costs", which was published not in 2009 but in 1988. (Both of them worked at Defense consultancy TRW.)
We've left mere "bullshit" some miles behind here. This isn't a blog post, this an official peer reviewed conference with proceedings published by the IEEE, and yet right on the first page we run into stuff that a competent reviewer would have red-flagged several times. (I'm glad I've let my IEEE membership lapse a while ago.)
Garden-variety plagiarism and bullshit (of which we are not in short supply) make me feel icky about being associated with "software engineering", but I want to distance myself from that last kind of stuff as strongly as I possibly can. I cannot be content to merely ignore academic software engineering, as most software developers do anyway; I believe I have an active duty to disavow it.
Very good investigation!May 19, 2016- Why did you not mention Pressman in the book? I thought it started with Boehm's "Software Engineering" paper.Feb 23, 2017
- And what does the (average) cost of 'a defect' even say? Why did no one make a distinction between conceptual thinking errors (which I assume could be more expensive) and typo's or other simpler-to-solve bugs?Feb 23, 2017
- And yes I am also very disappointed in software engineering as a science, currently reading your book and most papers you referenced. Great that you do this!Feb 23, 2017
+Sander van Hulst - good questions. As I say in the preface (I think) the book is a work in progress, I keep learning more all the time. For a while I incorporated what I learned into the book, but these days I tend to feel that I've moved on. Perhaps I ought to do at least one more revision...Feb 28, 2017
Add a comment...