Software Development Costs Pyramid

See pages 42 and 43 of this presentation (pdf).

Unfortunately the situation is as Jörg depicts, in fact somewhat worse: most of the references cited in this document strike me as bogus, in the sense that the paper cited either is not original research, or does not contain words supporting the claim being made, or - in the case of the 1998 paper about Hughes (p54) - contains measurements that in fact contradict what is implied by the curve in p42 of the presentation: different shape of the curve, and a modest x5 to x10 factor of cost-to-fix between the requirements phase and the functional test phase (and actually decreasing in system test and maintenance).


The Incredible Rate of Diminishing Returns of Fixing Software Bugs

Stefan Priebsh: OOP and Design Patterns: Codeworks DC in September 2009

(Stefan Priebsh: OOP and Design Patterns: Codeworks DC in September 2009)


This is a well-known result in empirical software engineering that has been replicated and verified over and over again in countless studies. Which is very rare in software engineering, unfortunately: most software engineering "results" are basically hearsay, anecdotes, guesses, opinions, wishful thinking or just plain lies. In fact, most software engineering probably doesn't deserve the "engineering" brand.

Unfortunately, despite being one of the most solid, most scientifically and statistically sound, most heavily researched, most widely verified, most often replicated results of software engineering, it is also wrong.

The problem is that all of those studies do not control their variables properly. If you want to measure the effect of a variable, you have to be very careful to change only that one variable and that the other variables don't change at all. Not "change a few variables", not "minimize changes to other variables". "Only one" and the others "not at all".

Or, in the brilliant Zed Shaw's words: "If you want to measure something, then don't measure other shit".

In this particular case, they did not just measure in which phase (requirements, analysis, architecture, design, implementation, testing, maintenance) the bug was found, they also measured how long it stayed in the system. And it turns out that the phase is pretty much irrelevant, all that matters is the time. It's important that bugs be found fast, not in which phase.

This has some interesting ramifications: if it is important to find bugs fast, then why wait so long with the phase that is most likely to find bugs: testing? Why not put the testing at the beginning?

The problem with the "traditional" interpretation is that it leads to inefficient decisions. Because you assume you need to find all bugs during the requirements phase, you drag out the requirements phase unnecessarily long: you can't run requirements (or architectures, or designs), so finding a bug in something that you cannot even execute is freaking hard! Basically, while fixing bugs in the requirements phase is cheap, finding them is expensive.

If, however, you realize that it's not about finding the bugs in the earliest possible phase, but rather about finding the bugs at the earliest possible time, then you can make adjustments to your process, so that you move the phase in which finding bugs is cheapest (testing) to the point in time where fixing them is cheapest (the very beginning).


Note: I am well aware of the irony of ending a rant about not properly applying statistics with a completely unsubstantiated claim. Unfortunately, I lost the link where I read this. Glenn Vanderburg also mentioned this in his "Real Software Engineering" talk at the Lone Star Ruby Conference 2010, but AFAICR, he didn't cite any sources, either.

If anybody knows any sources, please let me know or edit my answer, or even just steal my answer. (If you can find a source, you deserve all the rep!)