Project Management

Sustainable Test-Driven Development

by
Test-driven development is a very powerful technique for analyzing, designing, and testing quality software. However, if done incorrectly, TDD can incur massive maintenance costs as the test suite grows large. This is such a common problem that it has led some to conclude that TDD is not sustainable over the long haul. This does not have to be true. It's all about what you think TDD is, and how you do it. This blog is all about the issues that arise when TDD is done poorly—and how to avoid them.

About this Blog

RSS

Recent Posts

Acceptance Tests: Why Bother?

Do I Really Have to Test Everything? (part 3)

Do I Really Have to Test Everything? (part 2)

Do I Really Have to Test Everything?

TDD Tests as “Karen”s

Categories

PMI Training, TDD

Date

Lies, Damned Lies, and Code Coverage

linkedin twitter facebook Request to reuse this  

As unit testing has gained a strong foothold in many development organizations, many teams are now laboring under a code coverage requirement.  75% - 80% of the code, typically, must be covered by unit tests.  Most popular Integrated Development Environments (IDE’s) include tools for measuring this percentage, often as part of their testing framework.

Let’s ask a couple of questions, however:

  1. "What does code coverage actually measure?"
  2. "What does mandating a code coverage percentage get you?"

These two will yield another: 
  
     3. “Is code coverage actually useful for anything?”


What does code coverage actually measure?

Test-related code coverage measures the percentage of code[1] that is executed when the suite of unit tests run.  By demanding a high percentage of coverage, management is attempting to ensure quality; the premise being that if the code is invoked during the suite's execution it is therefore guaranteed to be correct.

But, consider this:

// pseudocode
class Foo {
    public ret someAlgorithm(par parameter){
        // some complex algorithm that should be tested
    }
    public ret someOtherAlgorithm(par parameter) {
        // some other complex algorithm that should be tested
    }
}

class FooTest {
    public void testOfFooBehvaior() {
        Foo testFoo = new Foo();
        testFoo.someAlgorithm(Any.par());
        testFoo.someOtherAlgorithm(Any.par());
        assertTrue(true);
    }
}


Anyone want to run the code coverage on this?  It is going to clock at 100%, assuming the algorithms comprise single code execution paths.  You might need to do a bit more if the paths branch (using different parameters in the calls), or more, depending on the type of coverage you're aiming for, and the test will always pass.  It’s a test of nothing (true always being, you know, true).

Code coverage does not measure how much code is tested, it covers how many lines of code is executed. Now, I can hear you saying “yeah, but that’s a completely contrived example! Why would anyone do that?

Even if the developers would not dare to do something so brazen, they might be tempted to write the simplest tests they could, perhaps using a tool that automatically generates a test-per-method to save time. These tests would simply reflect the current code's behavior, not the correct behavior of the system; what the system does, not what it is needs to do

Why would they do that?

Why indeed.  That gets us to question number two.

What does mandating a code coverage percentage get you?

There is an old adage in project management: “You get what you measure”[2].  Woe-betide the organization that decides to pay its developers based on the number of bugs they fix per quarter. There will be a lot of bugs to fix in that code!  Or, more realistically, many teams have been compensated for the number of lines of code they generate.  Not surprisingly they have been writing lots and lots of unnecessary code. This is just human nature.

If developers are writing unit tests because “the boss says so” then they have no real professional or personal motivation driving the activity.  They’re doing it because they have to, not because they want to.  Thus, they will put in whatever effort they have to in order to increase their code coverage to the required level and not one bit more.  It becomes a “tedious thing I have to do to before I can check in my code, period."

At a recent conference a member of the audience[3] came up after a talk we gave on TDD and shared a piece of code he had found in a code base he had inherited.  It was a class with a single method that did something legitimate (but was, apparently, difficult to write a unit test for  -- maybe it had a void return).  But the developer had added a second method.  This second method created an integer i, incremented the integer 700 times (not in a loop, but literaly 700 “i++;“ increments), and returned the result.  His unit test then called this second method and asserted the return was 700.  Because this bogus method was so lengthy he got his 75% code coverage without calling the legitimate method at all.  How had he arrived at 700?  He probably started with a smaller number and kept copying-and-pasting the “i++;“ until the coverage hit 75%.

Here again, this is a rather extreme case.  What’s not so extreme is leaving code in place that is actually never used (“dead code”) simply because it has tests, and since removing the code would mean removing the tests, this would lower the code coverage.  Should we keep dead code just so we can keep the tests? 

You get what you measure.

The only way to get developers to write the tests we really want them to write (and the only way to reliably get anyone to do anything, frankly) is to point out to them why they should care, what benefit will accrue to them if they spend time, energy, and passion to create them.  Most of the other topics we will write about in this work will, in one way or another, provide this motivation[4]. 

But... then is code coverage actually useful for anything? 

Yes, and here we will see an example of something that occurs repeatedly in TDD: using a tool for something other than it was intended for.  Something better.

Often in TDD, especially in the initial stage of developing some particular behavior of the system, we find ourselves less than certain about how to proceed, what exactly a requirement means, or just what the system’s code should do.  When we’re “in the weeds” we might choose to investigate the issue by writing a lot of small tests to work out the edges, boundaries, and permutations of a behavior in order to improve our understanding of it.  These “triangulation tests” [5] can be very useful, but they are often largely redundant.  Once we get the understanding we need, can write the proper test, and create the proper behavior in the system to get it to pass, we then will want to remove some, most, or all of the triangulation tests.

But... is it some, most, or all?  Here is where the code coverage measurement will help. Before removing a test that you believe to be redundant, run the coverage percentage and note it.  It should, in TDD, be 100% or very close to it.  Now remove the theoretically redundant test.  Finally, re-run the coverage percentage.  If it has slipped, even a little, then one of two things must be true: either the test you removed was not entirely redundant, or you have dead code somewhere in the system.  Either way now is the time to figure it out and fix it.

Here’s another use:  In TDD we usually find that the test suite, once we’re done developing the system, will serve other purposes.  One such purpose is this: if you come back to the system six months later  the suite of tests might be the best thing to read in order to get re-familiarized with the system.  If they all compile, and pass, then they are accurate to the system [6].  However, can we be certain that no one has added to the system without adding to the test suite?  Sure.  Run the code coverage.  If it’s not 100% then someone has enhanced the system without doing TDD, and you know it.

Developers who run code coverage for these purposes love their coverage tool.  And, as we’ll see, the kind of tests we’ll be learning about in this work will be the tests that developers love, care for, and always keep current to the system.  Because they help us to succeed.

----

[1] We know there are different types and levels of code coverage, the blog is relevant for all of them. See http://en.wikipedia.org/wiki/Code_coverage for more on the subject.
[2] This is often attributed to Lord Kelvin, but he actually said “If you cannot measure it, your knowledge is meager and unsatisfactory.”  Tom Peter’s paraphrase is more to our point: “"What gets measured, gets done ."   Or we can go to Albert Einstein, who wrote on his wall: "Not everything that counts can be counted, and not everything that can be counted counts."
[3] Paddy Healey is the gentleman.  
[4] This should not be read as a slam on developers, btw.  We’re often given bureaucratic tasks to complete in life, and it’s understandable that we have little energy on them.  We simply need to make sure our tests are not in that category!
[5] Much more on this in another blog.
[6] Much much more on this in another blog!

Posted on: February 11, 2021 02:45 AM | Permalink | Comments (0)

TDD and Defects

linkedin twitter facebook Request to reuse this  

We've said all along that TDD is not really about "testing" but rather about creating an executable form of specification that drives development forward.  This is true, and important, but it does not mean that TDD does not have a relationship to testing.  One interesting issue where there is significant synergy is in our relationship to defects.

Two important issues we'll focus on are: when/how a defect becomes known to us, and the actions we take at that point.
 

Time and Development


In the cyclic nature of agile development, we repeatedly encounter various points in time when we may discover that something is not right.  First, as we are writing the source code itself most modern tools can let us know that something is not the way we intended it to be.  For example when you end a method with a closed-curly-brace a good IDE will underline or otherwise highlight any temporary method variables that you created but never used.  Obviously if you created a variable you intended to use it so you must have done something other than you meant to.  Or, if you type an object reference name and then hit the dot, many IDE's will bring up a list of methods available for you to call on that type.  If the list does not appear then something is not right.

When compiling the source into the executable we encounter a number of points in time where the technology can check our work.  The pre-compiler (macros, if-defs, #defines), the compiler, the linker (resolving dependencies), and so forth.

And there are run-time checks too.  The class loader, generic type constraints, assertions of preconditions and postconditions, etc..  Various languages and technologies provide different levels of these services and they all can be "the moment" where we realize that we made an error that has resulted in a defect.
 

Detection vs. Prevention


Defects are inevitable and so we have to take action to either detect them or to prevent them.  Let's say for example that you have a method that takes as its parameters the position of a given baseball player on a team, and his jersey number, and then adds the player to a roster somewhere.  If you use an integer to represent the position (1 = Pitcher 2 = Catcher and so forth) then you will have to decide what to do if another part of the system incorrectly calls this method with something below 1 or above 9.  That would be a defect that the IDE/compiler/linker/loader would not find, because an int is type-safe for all values from minint to maxint [1].  So if the method was called with a 32, you'd have to put something in the code to deal with it: 32 mod 9 to determine what position that effectively is (Third Base if you're curious), correct the data (anything above 9 is reduced to 9, below 1 becomes 1), return a null, throw an IllegalPositionException to raise the alarm... something.  Whatever the customer wants.  Then you'd write a failing test first to drive it into the code.

If, however, you chose not to use an int, but rather create your own type with its own constraints... for example, an enumeration called PLAYER with members PITCHER, CATCHER, SHORTSTOP, etc... then a defect elsewhere that attempted to pass in PLAYER.QUARTERBACK would not compile and therefore would never make it into production.  We can think of this as defect prevention even though it isn't really, it's just very early detection.  But that vastly decreases the cost of repair.

 

Cost of Delays


The earlier you find the bug, the cheaper it is to fix.  First of all, the issue is fresher in your mind and thus you don't have to recapitulate the thought process that got you there.   It's less likely that you'll have more than one bug to deal with at a time (late detection often means that other bugs have arisen during the delay, sometimes bugs which involve each other) which means you can focus.  Also, if you're in a very short cycle then the defect is something you just did, which makes it more obvious.

The worst time to find out a defect exists, therefore, is the latest time.  It is when the system is operating either in the QA department's testing process or especially when actually in use by a customer.  When QA finds the bug it's a delayed find.  When a customer finds the defect it's further delayed but it also means:

  1. The customer's business has suffered

  2. The product's reputation is tarnished

  3. Your organization's reputation is tarnished

  4. It is personally embarrassing to you

  5. And, as we said, the cost to fix will be much higher

In a perfect world this would never happen, of course, but the world is complex and we are prone to errors.
 

TDD and Time


In TDD we add another point in time when we can discover an error: test time.  Not QA's testing but developer test time, test we run and thus create our own non-delayed moment of run time.  Tests execute the system so they have the same "experience" as QA or a customer, but since we run them very frequently they represent a faster and more granular defect indication.

You would prefer to prevent all defects from making into runtime, of course.  But you cannot.  So a rule in TDD is this: any defect that cannot be prevented from getting into production must have a specification associated with it, and thus a test that will fail if the spec is not followed.

Since we write the tests as part of the code-writing process and if we adhere perfectly to the TDD rule that says "code is never put into the source without a failing test that requires it"... and if we see the test fails until the code is added which then makes it pass... then we should never have code that is not covered (and meaningfully so [2]) by tests.  But here we're going to make mistakes too.  Our good intentions will fall afoul of the forces they always do; fatigue, misunderstandings, things we forget, bad days and interruptions, the fat-fingered gods of chaos.

With TDD as your process certainly far fewer defects will make it into the product, but it it will still happen from time to time.  But what that will mean will be different.
 

TDD and Runtime Defects


Traditionally a bug report from outside the team is placed into a tracking system and addressed in order of priority, severity, in the order they are entered, something along those lines.  But traditionally addressed means fixed.  This is not so in TDD.

In TDD a bug reported from production is not really a bug... yet.  Because if all of our tests are passing and if our tests are the specification of the system, this means the code is performing as specified.  There is no bug.  But it is not doing what the customer wants so it is the specification that must be wrong: we have a missing test.

Therefore fixing the problem is not job #1; adding the missing test is.  In fact, we want the defect in place so that when we 1) figure out what the missing test was and 2) add it to the suite we can 3) run it and see it fail.  Then and only then we fix the bug and watch the new test go green, completely proving the connection between the test and the code, and also proving that the defect in question can never make it into production again. 

That's significant.  The effort engaged in traditional bug fixing is transitory; you found it and fixed it for now, but if it gets back in there somehow you'll have to find it and fix it again.   In TDD the effort is focused more on adding the test, and thus it is persistent effort.  You keep it forever.
 

Special Cases


One question that may be occurring to you is "what about bad behavior that gets into the code that really is not part of the spec and should never be?"  For an example in the case of our baseball-player-accepting method above, what if a developer on the team adds some code that says "if the method gets called with POSITION.PITCHER and a jersey number of exactly 23, then add them to the roster twice."  Let's further stipulate that no customer asked for this, it's simply wrong.

Could I write a test to guard against that?  Sure; the given-when-then is pretty clear:
 

Given: a pitcher with jersey number 23
            an empty roster

When: the pitcher is passed into method X once

Then: a pitcher with jersey number 23 will appear once in the roster


But I shouldn't.  First of all, the customer did not say anything about this scenario, and we don't create our own specifications.  Second, where would that end?  How many scenarios like that could you potentially dream up?  Combinations and permutations abound. [3]

The real issue for a TDD team in the above example is how did that code get into the system anyway?  There was no failing test that drove it.  In TDD adding code to the system without a failing test is a malicious attack by the development team on their own code.  If that's what you're about then nothing can really stop you.

So the answer to this conundrum is... don't do that.  TDD does not work, as a process, if you don't follow its rules in a disciplined way.  But then again, what process would?

-S-

[1] You might, in fact, have chosen to do this because the rules of baseball told you to:
http://en.wikipedia.org/wiki/Baseball_positions

[2] What is "non-meaningful coverage"?  I refer you to:
https://www.projectmanagement.com/blog-post/68239/Lies--Damned-Lies--and-Code-Coverage

[3] I am not saying issues never arise with special cases, or that it's wrong to speculate; sometimes we discover possibilities the customer simply didn't think of.  But the right thing to do when this happens is go back to the customer and ask what the desired behavior of the system should be under circumstance X before doing anything at all.  And then write the failing test to specify it.

Posted on: February 11, 2021 02:39 AM | Permalink | Comments (0)

Redefining Test-Driven Development, Pt. 2

linkedin twitter facebook Request to reuse this  

In part 1 we said “How you do something new is often influenced to a great extent by what you think you are doing.”  Let’s add that, similarly, changing the way you do something you are already doing can come from a new understanding of its nature.

Something development teams already do (or, in our opinion, really should be doing) is to write a specification of the system before they create it.  This specification comes from an analysis of requirements, and reflects the development team’s understanding of the business value of the system from the customer’s perspective and the technology used to create the solution.  “The spec” is then referred to throughout the development process as fundamental guidance for everything the team does.

Specifications have great value; this value, however, is not persistent.  

Let’s say you created a specification in a traditional way: you wrote a document, embedded some design diagrams charts and graphs, and so forth.  This would form an artifact that expressed your understanding of the system.

Let’s further say that you used this specification to work from, completed the development process, released the system, and moved on.  

Now, eighteen months later, the customer wants to make changes to the system.  You’ve been away from the system for quite a while, and you’re fuzzy on the details, so job one is to re-acquaint yourself with it.  Should you re-read that specification you created way back when?  You could, but how do you know it is still accurate?  Someone could easily have made changes to the system and not updated the spec accordingly.

We all know we should not do that, but as a practical matter it happens all the time.  People make changes with limited time and resources, and under pressure... and often they simply neglect the spec entirely, or they update it incompletely or incorrectly.

And even if you don’t have any reason to suspect this has happened, how can you know, really know for sure, that it has not?  The only way is to examine the system in detail and compare it to the spec.  If you have to do this, they what good did having the written spec really do you?

So, consider this, a typical unit test:

// pseudocode
public class AccountTest {
    public void testAccountAmortizesCorrectly() {
        double value = Any.value();
        int term = Any.term();
        int yearToWriteOff = Any.yearUpTo(term);

        Account testAccount = new Account(value, term);
        double expectedAmount = max(value/term, 100.00);

        double actualAmount = testAccount.amortize(yearToWriteOff);

        assertAreEqual(expectedAmount, actualAmount, 1);
    }
}


Look closely.  What does this tell you?
 

  1. There is an object called Account that can amortize itself
  2. Account takes a value and a term via its constructor
  3. Value is double, term is int, and neither are constrained (“Any”) [1]
  4. Amortize means “write off”
  5. All years amortize in the same way (“Any” again)
  6. You call an amortize() method and pass the year to write off (an int) to it
  7. The way you know how much to write off is value/term, but no more than 100.00
  8. We do not care about pennies (the tolerance for the assertion is 1)

Would you not say that this could serve, at least for the development team, as a specification?  It tells you how the system should work, how it is structured, the API specifics (both constructor and public methods), etc... everything that a traditional spec would record.

Compare now, in the scenario where you’re coming back eighteen months later, this kind of specification to the document you would normally create.  You can run this “unit test” immediately, watch it compile (the API’s have not changed if it does), watch it pass (the behavior of the system has not changed if it does), and thus confirm that it is still accurate with no effort at all.  If we then further stipulate that every behavior of the system has a test like this, and we can run them all with a single click of the mouse, then we know our test suite is accurate to the code.  Now run your code coverage measurement... is it 100%?  Now you know that there is no additional behavior that has been added by someone else without that person adding such a unit test.

So, in TDD we do not write tests.  We write specifications.  Executable specifications.

Note that the testing framework itself (with just about every tool you’ll encounter) uses the term “assert.”  Look that one up:
 

Assert(v) to state with assurance, confidence, or force; state strongly or positively; affirm; aver: He asserted his innocence of the crime. [2]


Note this not “check” or “examine to determine if” or “confirm”.  When we assert something we do not say “this should be true” we say “this is true”.  It’s a statement of truth not an investigation.  It is not a test, but a fact about the system.

This simple shift in thinking from “I am writing a test” to “I am writing a specification” changes so many things about how you’ll write them, what you write and won’t write, what qualities you will look for and emphasize, how you’ll name things... and on and on... that we won’t even try to enumerate them here.  We’ll write an entire posting just about this (Testing as Specification).

So, why do we still call them tests?  Two reasons.

  1. First, “Test-Driven Development” is the term we are stuck with.  Language is a living thing, a shared thing, and we cannot dictate on our own what things are called.  We’d love to call it what it is: “Behaviour-Based Analysis and Design”, and we think of it that way, but at the end of the day...
  2. We’re not going to throw these executable specifications away when we’re done driving our development with them.  Why would we?  It took effort to make them, and we want to be able to refer to them later.  But you know what else they magically turn into at this point?  Tests!  We can used them to test against system regression when we need to refactor it.  These are regression tests we got for no extra effort, by the way.[3]

So, does TDD add new work to the development team?  No.  We were going to write a specification anyway, we’re just doing it in a different way now.  A better way, because it will be written in cold, hard code (rather than vaguely in human language), and it will be automatically verifiable against the real system at any point we desire, with no effort on our part.

And additionally, for free, it will produce a regression suite at well.  Most teams struggle mightily and do all sorts of shenanigans (see our upcoming blog Lies, Damn Lies, and Code Coverage) to achieve 75% to 80% code coverage.  We will have 100% [4] and we don’t have to do anything additional to get it.

All this leaves is the third objection from part 1...  what about the maintenance burden we take on when we have to keep the test suite up to date?  What about new requirements that cause dozens or even hundreds of tests to break, and have to be repaired?

Yes indeed, what about that?  Must have something to do with the word... Sustainable.

Stay tuned.

----
[1] We’ll talk about Any in a future blog
[2] http://dictionary.reference.com/browse/assert
[3] Not that we are saying our test suite will replace all traditional testing.  It will not.  But as a regression test suite it has a lot of value both for developers and testers alike
[4] ...or very close to it.  Nothing is ever perfect, after all


Redefining Test-Driven Development, Pt. 1

Redefining Test-Driven Development, Pt. 2

 

Posted on: February 10, 2021 11:04 AM | Permalink | Comments (0)

Redefining Test-Driven Development, Pt. 1

linkedin twitter facebook Request to reuse this  

How you do something new is often influenced to a great extent by what you think you are doing -- its precise nature, the steps and work-flows, and how it relates to other things that you already do and understand.  The term “Test-Driven Development”, while well-established in our industry, is perhaps an unfortunate choice of words to describe what we are doing, and thus how we choose to do it.  Here in part 1 we’ll examine the problem, and then later in part 2 we’ll suggest a solution.

Let’s start with the word “test”.  This is a word we already have a definition for; typically we think of a test as an evaluation of something, or a judgement of something relative to a standard, or perhaps an action that determines the correctness or incorrectness about something.  Test is a verb: “I shall test this.”  It is also a noun: “Let’s conduct a test to find out if this works.” 

In any case, the presumption is that there is something that is either correct, or operates correctly, or does not.  Clearly this is a nonsensical idea if the thing to be tested does not actually exist yet. 

In a typical TDD process, we write the test before we create the code we’re testing [1].  At the “testing point”, there is nothing to test.  Will the test fail?  Of course it will [2].  Something that does not exist can neither be right nor can it do the right thing.  So it would seem that we’re not really doing anything meaningful [3]. 

Some of you are probably thinking: “The test won’t fail.  It won’t even compile!”  Very true, but this is only because our technology (typically) works the way it does.  In another technology (Python, for example) referencing something that does not exist might simply cause the system to ignore you, or return 0, or null, or something else.  This is one reason why we like strongly-typed languages and strict compilers.  However, note what the compiler is actually saying: “This makes no sense!  You’re trying to refer to something that does not exist!” 

All of this would seem to indicate that we have to do it the other way ‘round: that we’ve got to create the thing to be tested before we can create the test.  It’s just common sense.

Then there is the notion of “driven”.  The notion of “test” in conflict with the notion of “driven”.  If one activity drives another, then one would normally expect the driving activity to precede the driven activity, temporally.  If thing X happens which then causes thing Y, and if this causality can be proven, then we can say X drove Y.  But if the test must be created after the tested thing, then how can the test drive the tested?
 
Finally we have “development”.  Development is the creation of something, usually from a plan or goal or set of principles.  If tests are to drive development, then they must cause it.  Thus they must constitute the plan or goal or set of principles.  But tests in the traditional software sense are not plans, they are an examination of the system to determine if it meets its success criteria.. 

This confusion can cause lots of problems:

  1. People won’t get the point, and will reject the idea intellectually: “that makes no sense” 
  2. People will see this as “new work” for the team to do, and will thus slow the team down: “that will be wasteful”
  3.  People will see the product (a collection of tests) as a new maintenance burden for the team: “that cannot be sustained over time”

In other words, TDD tests would seem to constitute at best a tremendous added cost, and at worse a totally meaningless one.  This is categorically untrue, and we begin by re-defining what we’re doing. 

In TDD, as it turns out, we don’t write tests first.  In fact... in TDD we don’t write tests at all. 

Stay tuned for part 2... :) 


--- 

[1] As we will see in future blogs, the test-first technique does not actually equate to TDD, but it is a very common approach, and very compatible with TDD. 

[2] ...and what if it doesn’t?  What would that mean?  That’s the subject of another blog... 

[3] I can tell you a-priori that any test written before the thing it tests exists will fail, without even knowing what the test is about.  Therefore actually writing the test and watching it fail is not going to tell me something I didn’t already know.  So why do it?

Posted on: February 10, 2021 11:02 AM | Permalink | Comments (0)

TDD and Its (at least) 5 Benefits

linkedin twitter facebook Request to reuse this  

Many developers have concerns about adopting test-driven development, specifically regarding:

  • It's more work.  I'm already over-burdened and now you're giving me a new job to do.
  • I'm not a tester.  We have testers for testing, and they have more expertise than I do.  It will take me a long time to learn how to write tests as well as they do.
  • If I write the code, and then test it, the test-pass will only tell me what I already know: the code works.
  • If I write the test before the code the failing of the test will only tell me what I already know: I have not written the code yet.

Here we are going to deal with primarily the first one:  It's going to add work.

This is an understandable concern, at least at initially, and it is not only the developers that express it.  Project managers will fear that the team's productivity will decrease, which they are accountable for.  Project sponsors fear that the cost of the project will go up if the developers end up spending a fair amount of their time writing tests.  The primary cost of creating software is developer time.

The fact is, TDD is not about adding new burdens to the developers, but rather it is just the opposite: TDD is about gaining multiple benefits from a single activity.

In the test-first activity developers are not really writing tests.  They look like tests, but they are not (yet).  They are an executable specification (this is a critical part of our redefinition of TDD entry).  As such, they do what specifications do: they guide the creation of the code.  Traditional specifications, however, are usually expressed in some colloquial form, perhaps a document and/or some diagrams.  Communication in this form can be very lossy and easy to misinterpret.  Missing information can go unnoticed.

For example, one team decided to create a poker game as part of their training on TDD.  Often an enjoyable project is good when learning as we tend to retain information better when we're having a good time.  Also, these developers happened to live and work in Los Vegas. :) Anyway, it was a contrived project and so the team came up with the requirements themselves; basically the rules of poker and the mechanics of the game.  One requirement they came up with was "the system should be able to shuffle the deck of cards into a reordered state."  That seemed like a reasonable thing to require until they tried to write a test for it.  How does one define "reordered?"  One developer said "oh, let's say at least 90% of the cards need to be in a new position after the shuffle completes."  Another developer smiled and said "OK, just take the top card and put in on the bottom.  100% will be in a new position.  Will that be acceptable?"  They all agreed it would not.  This seemingly simple issue ended up being more complicated than anyone had anticipated.

In TDD we express the specification in actual test code, which is very unforgiving.  One of the early examples of this for us was the creation of a Fahrenheit-to-Celsius temperature conversion routine.  The idea seemed simple: take a measurement in Fahrenheit (say 212 degrees, the boiling point of water at sea level), and convert it to Celsius (100 degrees).  That statement seems very clear until you attempt to write a unit test for it, and realize you do not know how accurate the measurements should be.  Do we include fractional degrees?  To how many decimal places?  And of course the real question is what is this thing going to be used for?  This form of specification will not let you get away with not knowing because code is exacting like this.

Put another way, a test would ask "how accurate is this conversion routine?"  A specification asks "how accurate does this conversion routine need to be" which is of course a good question to ask before you attempt to create it.

The first benefit of TDD is just this: it provides a very detailed, reliable form of something we need to create anyway, a functional specification.

Once the code-writing beings, this test-as-specification serves another purpose.  Once we know what needs to be written, we can begin to write it with a clear indication of when we will have gotten it done.  The test stands as a rubric against which we measure our work.  Once it passes, the behavior is correct.  Developers quickly develop a strong sense of confidence in their work once they experience this phenomenon, and of course confidence reduces hesitancy and tends to speed us up.

The second benefit of TDD is that it provides clear, rapid feedback to the developers as they are creating the product code.

At some point, we finish our work.  Once this happens the suite of tests that we say are not really tests (but specifications) essentially "graduate" into their new life: as tests, in the traditional sense.  This happens with no additional effort from the developers.  Tests in the traditional sense are very good to have around and provide three more benefits in this new mode...

First, they guard against code regression when refactoring.  Sometimes code needs to be cleaned up either because it has quality issues (what we call "olfactoring"), or because we are preparing for a new addition to the system and we want to re-structure the existing code to allow for a smooth introduction of the enhancement.  In either case, if we have a set of tests we can run repeatedly during the refactoring process, then we can be assured that we have not accidentally introduced a defect.  Here again, the confidence this yields will tend to increase productivity.

The third benefit is being able to refactor existing code in a confident and reassured fashion.

But also, they provide this same confirmation when we actually start writing new features to add to an existing system.  We return to test-as-specification when writing the new features, with the benefits we've already discussed, but also the older tests (as they continue to pass) tell us that the new work we are doing is not disturbing the existing system. Here again, allows us to be more aggressive in how we integrate the newly-wanted behavior.

The fourth benefit is being able to add new behavior in this same way.

But wait, there's more!  Another critical issue facing a development team is preventing the loss of knowledge.  Legacy code often has this problem:  the people who designed and wrote the systems are long gone, and nobody really understands the code very well.  A test suite, if written with this intention in mind, can capture knowledge because we can consider it any time to be "the spec" and read it as such. 

There are actually three kinds of knowledge we need to retain.

  1. What is the valuable business behavior that is implemented by the system?
  2. What is the design of the system?  Where are things implemented?
  3. How is the system to be used?  What examples can we look at? 

All of this knowledge is captured by the test suite, or perhaps more accurately, the specification suite.  It has the advantage over traditional documentation of being able to be run against the system to ensure it is still correct.

So the fifth benefit is being able to retain knowledge in a trustworthy form.

Up do this point we've connected TDD to several critical aspects of software development:

  1. Knowing what to build (test-first, with the test failing)
  2. Knowing that we built it (turning the test green)
  3. Knowing that we did not break it when refactoring it (keeping the test green)
  4. Knowing that we did not break it when enhancing/tuning/extending/scaling it (keeping the test green)
  5. Knowing, even much later, what we built (reading the tests after the fact)


All of this comes from one effort, one action.

And here's a final, sort of fun one:  Have you ever been reviewing code that was unfamiliar to you... perhaps written by someone else or even by you a long time ago, and you come across a line of code that you cannot figure out.  "Why is this here?   What is it for?  What does it do?  Is it needed?"  One can spend hours poring over the system, or trying to hunt down the original author who may herself not remember.  It can be very annoying and time-consuming.

If the system was created using TDD, this problem is instantly solved.  Don't know what a line of code does?  Break it, and run your tests.  A test should fail.  Go read that test.  Now you know.

Just don't forget to Crtl-Z. :)

But what if no test fails?  Or more than one test fails?  Well, that's why you're reading this blog.  For TDD to provide all these benefits, you need to do it properly...

Posted on: February 10, 2021 11:00 AM | Permalink | Comments (0)
ADVERTISEMENTS

"If you're going to do something tonight that you'll be sorry for tomorrow morning, sleep late."

- Henny Youngman

ADVERTISEMENT

Sponsors