Project Management

Sustainable Test-Driven Development

Test-driven development is a very powerful technique for analyzing, designing, and testing quality software. However, if done incorrectly, TDD can incur massive maintenance costs as the test suite grows large. This is such a common problem that it has led some to conclude that TDD is not sustainable over the long haul. This does not have to be true. It's all about what you think TDD is, and how you do it. This blog is all about the issues that arise when TDD is done poorly—and how to avoid them.

About this Blog


Recent Posts


The Importance of Test Failure

Mock Objects, Part 1

Mock Objects, Part 2

Mock Objects, Part 3

Testing Best Practices: Test Categories, Part 1

Successfully adopting and practicing TDD in a sustainable way involves many distinctions, best-practices, caveats, and so forth.  One way to make such information accessible is to put in into a categorized context.  The Design Patterns, for instance, are often categorized into behavioral, structural, and creational.[1]  Here we will do a similar thing with the executable specifications (“tests”) we write when doing TDD.

We have identified four categories of unit tests, namely: functional, constant specification, creational, and work-flow.  We’ll take them one at a time.


The first unit test a developer ever writes is often an assertion against the return of a method call.  This is because systems often operate by taking in parameters and producing some kind of useful result.  For example, we might have a class called InterestCalculator with a method called CalcInterest() that takes some parameters (a value, a rate, a term, and perhaps the month to calculate for) and then returns the proper interest to charge or pay, depending on the application context.

The primary way of creating useful behavior in software is, in fact, in writing such methods.  However, how we test them will depend on the nature of the behavior.  We can, therefore, further sub-divide the ‘Functional’ category into the following types:

1. Static behavior

This is the simplest.  If a method produces a simple, non-variant behavior, then we simply need to pick some parameters at random, call the method, and assert that the result is correct.  For example:

// pseudocode
class Calculator {
    public int add(int x, int y) {
        return x + y;

// pseudotest
class CalculatorTest {
    public void testAddBehavior() {
        int anyX = 6;
        int anyY = 5;
        int expectedReturn = 11;

        Calculator testCalculator = new Calculator();
        int actualReturn = testCalculator.add(anyX, anyY);
        assertEqual(expectedReturn, actualReturn);

Adding two numbers always works the same way, so all we need is a single assertion to demonstrate the behavior in order to specify it.  Note that we have named our temporary variables in the test anyX and anyY to make it clear that these particular values (5 and 6, respectively) are not in any way important, that the test is not about these values in particular.  The test is about the addition behavior, as implemented by the add()method.  We simply needed some input parameters in order to get the method to work, and so we picked arbitrary (any) values for our test. [2]

This is important, because we want it to be very easy for someone reading the test to be able to focus on the important, relevant part of the test and not on the “just had to do this” parts.  Here again, thinking of this as a specification leads us to this conclusion.

Static behavior is the same for all values of all parameters passed.  For example, f() here takes a single parameter, while g() takes two. But for all values of these parameters, the behavior is the same and so we pick "any" values to demonstrate this.

2. Singularity

If a behavior is always the same (static) except for one particular condition where it changes, we call this condition a singularity.

The classic example is divide-by-zero.  In division, the behavior is always the same unless the divisor is zero, in which case we need to report an error condition.  Here we’d need two assertions: one, like the one for static behavior, would pick ‘any’ two numbers but where the second is non-zero, then show the division, then another that shows the error report when the second number is zero.

It does not, of course, have to be a mathematical issue: it could be a business rule.   Let’s say, for example, that we charge a fee of $10 for shipping unless it is the first day of the month when we ship for free.  We’re trying to encourage sales at the beginning of the month.  Thus, the first day would be the singularity, and we’d write this test [3]:

public void testShippingIsFreeOnTheFirstDayOfTheMonth() {
    ShippingCalc shippingCalc = new ShippingCalc();
    int anyDateOtherThanTheFirst = 5;
    const int FIRST_DAY_IN_MONTH = 1;
    amount expectedStandardFee = ShippingCalc.STANDARD_FEE;
    const amount FREE = 0.00;

Note the use of the term “any” for the date we “don’t care about, they’re all the same”, which we call anyDateOtherThanTheFirst , and then the fact the FIRST_DAY_IN_MONTH is clearly special.

Another example would be choosing a specific behavior for one element in a set. For example if some function is legal only for one type of user, and all other types should get an exception:


public void testOnlyAdminCanGetCoolStuff() {
    StuffGetter getter = new StuffGetter();
    Stuff stuff;

    int anyNonAdmin = Users.REGULAR;
    try {
        stuff = getter.getCoolStuff(anyNonAdmin);
        Assert.Fail("Cool stuff should go to ADMIN only"); 
    } catch (PresumptionException) {}
    stuff = getter.getCoolStuff(User.ADMIN);

Two examples.... f() with it's single parameter provides the same behavior for all values but one... the point indicated.  With the two parameters g() takes, the singularity may involve them both, creating a point, or it may only pertain to one, creating a line.  For instance, if x is "altitude" and y is "temperature" then a point might indicate "same behavior for all values except 3000 feet and 121 degrees.  The line might indicate "the same behavior for all values except 2000 feet at any temperature".

3. Behavior with a boundary

Sometimes the behavior of a method is not always uniform, but changes based on the specific parameters it is passed.  For example, let’s say that we have a method that applies a bonus for a salesperson, but the bonus is only granted if the sale is above a certain minimum value, otherwise it is zero.  Further, the customer tells us that pennies don’t count, the sale must be an entire dollar over the minimum sales value.:

In this case there exists a special sales amount, which affects the behavior of the getBonus() function.  We need to specify this boundary -- the place where the behavior changes -- and since every boundary has two sides, we need to explicitely specify these values and relate them:

class SalesApplicationTest {
  public void testBonusOnlyAppliesAboveMinumumSalesForBonus() {
    double maxNotEligibleAmount =
      SalesApplication.BONUS_THRESHOLD + .99;
    double minEligibleAmount =
      SalesApplication.BONUS_THRESHOLD + 1.00;
    double expectedBonus = minEligibleAmount *
    SalesApplication testSalesApp = new SalesApplication();


This specifies, to the reader, that the point of change between no bonus and the bonus being applied is at the BONUS_THRESHOLD value, and also (per the customer) that the sale must be a full dollar above the minimum before the bonus will be granted.  This is called the epsilon, the atom of the change, and you’ll note that we are clearly demonstrating it as one penny, the penny that takes us from 99 cents over the minimum to 1 full dollar over it.

One might be tempted to assert against other values, like 200 dollars over the minimum, or .32 cents above it, or loop through all possible values above and below the transition point.  Or to pick “any” value above and “any” value below.  The point is that .99 cents and 1 dollar are significant amounts over the minimum, they matter to the customer, and so we need to specify them as unique.

We also want our tests to run fast, and so looping though all possible values is not only unnecessary, it is counter-productive.

Two points define the boundary where behavior changes, and we also demonstrate the epsilon (or atom) of change.

4. Behavior within a range

There can be, of course multiple boundaries that change behavior.  If these boundaries are independent of each other, then we call this a range.

For example, let us say that the acceptable temperature of an engine manifold must be between 32.0 and 212.00 degrees Fahrenheit (too cold, and the engine freezes, too hot and it overheats).  These are not related to each other (we could install anti-freeze to make lower temperatures acceptable while the upper limit might not change, or vice-versa using coolant), and so each would be specified with two asserts, one at and one above the boundary in each case.

But let’s not forget the epsilon!  How much is “over” or “under”?  One degree?  One tenth of a degree?  Ten degrees?  How sensitive should this system be?  Here again, this is a problem domain specification, and thus we have to know what the customer wants before we can create the test.

Also, note that whereas for integers the natural epsilon is 1, for floating point numbers that epsilon value depends on the base number. The larger it is, the larger the epsilon needs to be. Constants such as Double.Epsilon only indicate the smallest possible number, not the smallest discernible difference between values.

 Two boundaries, with epsilons for each.  Note the boundaries of a simple range are not related to each other.

[1] In point of fact, we don’t actually completely agree with this method of categorizing the Design Patterns, but it does serve as a reasonable example of categorization in general.

[2] There are other ways to do this.  In another blog we will discuss the use of an “Any” class to make these “I don’t care” values even more obvious.

Continued in Test Categories, Part 2

Posted on: February 11, 2021 06:31 AM | Permalink | Comments (0)

Testing the Chain of Responsibility, Part 2

Chain Composition Behaviors

We always design services for multiple clients.  Even if a service (like the Processor service in our example) has only a single client today, we want to allow for multiple clients in the future.  In fact, we want to promote this; any effort expended to create a service will return increasing value when multiple clients end up using it.

So, one thing we definitely want to do is to limit/reduce the coupling from the clients’ point of view. The run-time view of the CoR from the client’s point of view should be extremely limited:

Note that the reality, on the right, is hidden from the client, on the left.  This means we can add more processors, remove existing ones, change the order of them, change the rules of the termination of the chain, change how any/all of the rules are implemented... and when we do, this requires no maintenance on the clients.  This is especially important if there are (or will be, or may be) clients that we don’t even control.  Maybe they live in code belonging to someone else.

The one place where reality cannot be concealed is wherever the chain objects are instantiated.  The concrete types, the fact that this is a linked list, and the current order of the list will be revealed to the entity that creates the service.   If this is done in the client objects, then they all will have this information (it will be redundant).  Also, there is no guarantee that any given client will build the service correctly; there is no enforcement of the rules of its construction.  

This obviously leads us to prefer another option.  We may, for example, decide to move all creation issues into a separate factory object.

It may initially seem that by doing so we’re just moving the problem elsewhere, essentially sweeping it under the rug. The advantage comes from the fact that factory objects, unlike clients,  do not tend to increase in number.  So, at least we’ve limited our maintenance to one place.  Also, if factories are only factories then we are not intermixing client behavior and construction behavior.  This results in simpler code in the factories, which tends to be easier to maintain.  Finally, if all clients use the factory to create the service, then we know (if the factory works properly) that the service is always built correctly.

We call this the separation of use from creation, and it turns out to be a pretty important thing to focus on.  Here, this would lead us to create a ProcessorFactory that all clients can use to obtain the service, and then use it blindly.  Initially, this might seem like a very simple thing to do:

public class ProcessorFactory {
    public Processor GetProcessor() {
           return new LargeValueProcessor(

new SmallValueProcessor(

new TerminalProcessor()));


Pretty darned simple.  From the clients’ perspective, the issue to specify in a test is also very straightforward: I get the right type from the factory:

public class ProcessorFactoryTest {
    public void TestFactoryReturnsProperType() {
         Processor processor =
              new ProcessorFactory().GetProcessor();
         Assert.IsTrue(processor is Processor);

This test represents the requirement from the point of view of any client object.  Conceptually it tells the tale, though in strongly-typed language we might not want to actually write it.  This is something the compiler enforces, and therefore is a test that actually could never fail if it compiles.  Your mileage may vary.

However, there is another perspective, with different requirements that must also be specified.  In TDD, we need to specify in tests:

  1. Which processors are included in the chain (how many and their types)
  2. The order that they are placed into the chain (sometimes)  [4]

Now that the rules of construction are in one place (which is good) this also means that we must specify that it works as it should, given that all clients will now depend on this correctness.

However, when we try to specify the chain composition in this way we run into a challenge:  since we have strongly encapsulated all the details, we have also hidden them from the test.  We often encounter this in TDD; encapsulation, which is good, gets in the way of specification through tests.

Here is another use for mocks.  However, in this case we are going to use them not simply to break dependencies but rather to “spy” on the internal aspects of an otherwise well-encapsulated design. Knowing how to do this yields a huge advantage: it allows us to enjoy the benefits of strong encapsulation without giving up the equally important benefits of a completely automated specification and test suite.

This can seem a little tricky at first so we’ll go slow here, step by step.  Once you get the idea, however, it’s actually quite straightforward and a great thing to know how to do.

Step 1: Create internal separation in the factory

Let’s refactor the factory just a little bit.  We’re going to pull each object creation statement (new x()) into its own helper method.  This is very simple, and in fact most modern IDEs will do it for you; highlight the code, right-click > refactor > extract method..

public class ProcessorFactory {
    public Processor GetProcessor() {
           return MakeFirstProcessor(




    protected virtual Processor MakeFirstProcessor(

Processor aProcessor)    {

           return new LargeValueProcessor(aProcessor);

    protected virtual Processor MakeSecondProcessor(

Processor aProcessor)    {

           return new SmallValueProcessor(aProcessor);

    protected virtual Processor MakeLastProcessor() {
           return new TerminalProcessor();

Note that these helper method would almost certainly be made private by an automated refactoring tool.  We’ll have to change them to protected virtual (or just protected in a language like Java where methods are virtual by default) for our purposes.  You’ll see why.

Step 2: Subclass the factory to return mocks from the helper methods

This is another example of the endo testing technique we examined in our section on dependency injection:

private class TestableProcessorFactory : ProcessorFactory {
    protected override Processor MakeFirstProcessor(

Processor aProcessor)    {

           return new LoggingMockProcessor(

typeof(LargeValueProcessor), aProcessor);


    protected override Processor MakeSecondProcessor(

Processor aProcessor)    {

           return new LoggingMockProcessor(

typeof(SmallValueProcessor), aProcessor);


    protected override Processor MakeLastProcessor() {
           LoggingMockProcessor mock = new LoggingMockProcessor(

typeof(TerminalProcessor), null)

mock.iElect = true;

           return mock;

This would almost certainly be a private inner class of the test.  If you look closely you’ll see three important details.  

  • Each helper method is returning an instance of the same type (which we’ll implement next),  LoggingMockProcessor, but in each case the mock is given a different type to specify in its constructor [5]
  • The presence of the aProcessor parameter  in each method specifies the chaining behavior of the factory (which is what we will observe behaviorally through the mocks)  
  • The MakeLastProcessor() conditions the mock to elect.  As you’ll see, these mocks do not elect by default (causing the entire chain to be traversed) but the last one must, to specify the end of delegation

Step 3: Create a logging mock object and a log object to track the chain from within

Here is the code for the mock:

private class LoggingMockProcessor : Processor {
    private readonly Type mytype;
    public static readonly Log log = new Log();
    public bool iElect = false;
    public LoggingMockProcessor (Type processorType,

Processor nextProcessor):base(nextProcessor) {

           mytype = processorType;

    protected override bool ShouldProcess(int value) {
           return iElect;

    protected override int ProcessThis(int value) {
         return 0;

The key behavior here is the implementation of ShouldProcess() to add a reference of the actual type this mock represents to a logging object.  This is the critical part -- when the chain of mocks is asked to process, each mock will record that it was reached, the type it represents, and we can also capture the order in which they are reached if we care about that.

The implementation of  ProcessThis() is trivial because we are only interested in the chain’s composition, not its behavior.  We’ve already fully specified the behaviors in previous tests, and each test should be as unique as possible.  

Also note that this mock, as it is only needed here, should be a private inner class of the test.  Because the two issues inclusion and sequence are part of the same behavior (creation), everything will be specified in a single test.

The Log, also a private inner class of the test, looks something like this:

private class Log {
    private List myList;
    public void Reset() {
           myList = new List();
    public void Add(Type t) {

    public void AssertSize(int expectedSize) {
           Assert.AreEqual(expectedSize, myList.Count);

    public void AssertAtPosition(Type expected, int position) {
           Assert.AreEqual(expected, myList[position]);

It’s just a simple encapsulated list, but note that it contains two custom assertions.  This is preferred because it allows us to keep our test focused on the issues it is specifying, and not on the details of “how we know”.  It makes the specification more readable, and easier to change.  

(A detail: The log is “resettable” because it is held statically by the mock.  This is done to make it easy for all the mock instances to write to the same log that the test will subsequently read.  There are other way to do this, of course, but this way involves the least infrastructure.  Since the log and the mock are private inner classes of the test, this static member represents very little danger of unintended coupling.)

Step 4: Use the “spying” capability of the mock in a specification of the chain composition

Let’s look at the test itself:

public void TestFactoryReturnsProperChainOfProcessors() {
    // Setup
    ProcessorFactory factory = new TestableProcessorFactory();
    const int correctChainLength = 3;
    List correctCollection =

new List {

typeof (LargeValueProcessor),

               typeof (SmallValueProcessor),
               typeof (TerminalProcessor)
    Processor processorChain = factory.GetProcessor();
    Log myLog = LoggingMockProcessor.log;

// Trigger     


    // Verification

for (int i = 0; i < correctCollection.Count; i++) {

           myLog.AssertAtPosition(correctCollection[i], i);

If the order of the processors was not important, we would simply change the way the log reports their inclusion:

// In Log
public void AssertContains(Type expected){

...and call this from the test instead.

// In TestFactoryReturnsProperChainOfProcessors()
for (int i = 0; i < correctCollection.Count; i++) {

Some testing frameworks actually provide special Asserts for collections like this.


OK, we know what some of you are thinking.  “Guys, this is the code you’re testing:”

public Processor GetProcessor() {
           return MakeFirstProcessor(




“...and look at all the *stuff* you’ve created to do so!  Your test is several times the size of the thing you’re testing!   Arrrrrrrrrgh!”

This is a completely understandable objection, and one we’ve felt in the past.  But to begin with remember that in our view this is not a test, it is a specification.  It’s not that unusual for specifications to be longer than the code they specify.  Sometimes it’s the other way around.  It just depends on the nature of the specification and the implementation involved.

The specification of the way the space shuttle opened the cargo bay doors was probably a book. The computer code that opened it was likely much shorter.

Also, this is a reflection of the relative value of each thing.  Recently, a friend who runs a large development team got a call in the middle of the night, warning him of a major failure in their server farm involving both development and test servers.  He knew all was well since they have offsite backups, but as he was driving into work in the wee hours he had time to ask himself “if I lost something here... would I rather lose our product code, or our tests?”
He realized he would rather lose the product code.  Re-creating the source from the tests seemed like a lot less work than the opposite (that would certainly be true here).  But what that really means is that the test/specifications actually have more irreplaceable value than the product code does.

In TDD, the tests are part of the project.  We create and maintain them just like we do the product code.  Everything we do must produce value... and that’s the point, not whether one part of the system is larger than another.  And while TDD style tests do certainly take time and effort to write, remember that they have persistent value because they can be automatically verified later.

Finally, ask yourself what you would do here if the system needed to be changed, say, to support small, medium, and large values?  We would test-drive the new MediumValueProcessor, and then change TestFactoryReturnsProperChainOfProcessors() and watch it fail.  We’d then update the factory, and watch the failing test go green. We’d also have automatic confirmation that all other tests remained green throughout.

That’s an awfully nice way to change a system.  We know exactly what to do, and we have concrete confirmation that we did exactly and only that.  Such confidence is hard to get in our business!

[4] Some CoRs require their chain elements to be in a specific order.  Some do not.  For example, we would not want the TerminalProcessor to be anywhere but at the end of the chain.  So, while we may not always care about/need to specify this issue, it’s important to know how to do it.  So we’ll assume here that, for whatever domain reason, LargeValueProcessor must be first, SmallValueProcessor must be second, and TerminalProcessor must be third.

[5] We’re using the class objects of the actual types.  You could use anything unique: strings with the classnames, an enumeration, even just constant values.  We like the class objects because we already have them.  Less work!

Posted on: February 11, 2021 04:51 AM | Permalink | Comments (0)

Testing Through API's

We recently got a question from Tomas Vykruta, a colleague of ours, and we felt it turned out to be such a good, fruitful question that we wanted to pass it, and our answers, along in this blog.

Here is Tomas' question:

Do you prefer to have unit tests written against the public API, or to test individual functions inside the API? I've seen both approaches at my company, and in many cases, a single class is unit tested with a mix of the two. I haven't seen this topic addressed in any style or testing guides, so it seems to be left as a choice to the author.

While there is likely no right or wrong answer here and each class will require some combination, I thought it would be interesting to enumerate your real world experiences (good and bad) resulting from these 2 strategies. Off the top of my head, here are some pros (+) and cons (-).

+ If internal implementation details of API change,s the unit tests don't have to. Less maintenance.
+ Serves as documentation for public usage of API.
+ Does not require fabricating the internal API in a way as to make every function easily testable.
+- Possibly less code to write.
- Does not serve as documentation for individual internal functions.
- Unit tests are less likely to test every single internal function thoroughly.
- Test failures can take some time to track down and identify and require understanding the internal API.

Internal API unit testing (individual functions):
+ Unit tests are very simple, short, quick to write and read.
+ Functions are very thoroughly tested, easy to verify against full range of inputs.
+ Serves as documentation for every internal function.
+ Test failures are easily identifiable even for engineers not familiar with the code base, since each test is focused on a very limited bit of code.
- When any implementation details change, the tests must change with it.
- Not useful to pure external API users who don't care about internal implementation details.

Scott's Response:

My view is this: if you consider the test suite as you would a specification of the system, then the question as to whether to test at one level or another becomes: “is it specified?” 

Systems produce behavioral effects, and these effects are what determine the value of the system.  Value, however, is always from the point of view of a “client” or “customer” and every system has several customers.  All these customers have a behavioral view of the system which can be specified.

For example, the end users have a specification: “this can accurately calculate my income tax”.  But so does the legal department: “this has a EULA that indemnifies us against tax penalties”.  And the marketing department: “the system has a tax-evaluation feature that our competitor does not”.  And the developers themselves: “this has an extensible system for taxation algorithms.”  Etc…

Anything in anyone’s spec needs a test.  Some of these will be at the API level, some will be further in.

Not all implementation details are part of the specification.  If you are able to refactor a particular implementation and still satisfy all customer specifications, then the implementation does not require a separate test.

Amir's Response:

Scott has already expanded on the difference between testing and specification. I would like to add a little to this ‘specification’ perspective.

Let me start by saying that all TDD tests must only use public interfaces. This can be interpreted to mean – you must only test through APIs, as they are the public interface of the system. This is true when you consider the external consumers of the system. They see only the public API and hence ‘feel’ the system’s behavior through it. The TDD test will specify what this behavior is (for better or worse).

And just to clarify – when we say ‘public interface,’ we do not refer only to the exposed functional interface. A public interface can also be the GUI, database schema, specific file formats, file names, URL format, a log (or trace facility),  Van Eck phreaking,  or a Ouija board. As long as the usage of the public interface allows an external entity to affect your system of vice versa, it is considered public.

Some of the interfaces mentioned above may be used by entities within the company, such as support or QA. For all intents and purposes they are still customers of the system and as such their needs (e.g., the types of error report generated under specific circumstances, or the ability to throttle the level of tracing done, or the ability to remotely control a client system) must be specified in the TDD tests. After all, you still want the ‘intra-organizational’ behavior to be known an invariant to other changes.

When we do TDD however, we are not concerned only about the system’s external behavior (as defined above), but also about its internal behavior. This internal behavior has two manifestations (and this is our arbitrary nomenclature, but I hope it makes sense). First is the architecture of the product, second is its design. These two may seem to be the same but there is a subtle difference between them.

The system’s architecture is the set of design decisions that were made to accomplish functional and performance goals. Once set, these become a requirement. An individual developer or team cannot decide to do things differently, but has to operate with these architectural guidelines. This is specified through a set of tests that specify how every architectural element contributes to the implementation of the desired overall behavior.

The system’s design is the set of design decision that are made by the team and individual developers, and are considered to be ‘implementation choices’. The team can assign whichever responsibilities it deems reasonable to the different design entities in order to achieve the desired behavior. This is all well, except that there is one ‘tacit’ requirement that is solely in the responsibility of the team (and probably the technical organization management). This requirement is maintainability, and it is what guides the team in their design choices. The TDD tests help us specify both what the system design is and also what the specific responsibilities assigned to the system entities are.

The point about both design and architecture is that they are internal to the system. As such, how can you test-drive them through the system’s APIs? By testing through the APIs I can see that the behavior is specified correctly. I cannot see that the architecture is adhered to or that the design promotes maintainability.

The answer to this paradox lies in the definition of the word ‘public’. Public is a relative term. If you live in a high rise condo, then the ‘public’ interface may be the building’s front door. But consider the individual apartments. The neighbors can’t come into your condo at will, can they? The condo has a public interface – its door, which is hidden to those outside the building (private) but visible and usable by the internal neighbors. Inside your condo this division continues. You have rooms, with doors (their public interfaces), and storage cabinets, with their doors, and boxes, with their lids, and bottles with their caps. What we get is a complex set of enclosures which are public to their immediate surrounding and private to anything further out.

Computer systems are the same. The APIs are the public doorways to the surrounding clients – these clients do not see the way the system is composed. But the elements of the system themselves do see this design –- they can see the other elements (which they interact with) although they cannot see inside these elements. The interfaces that these inner elements expose, are they private or public. Well, that depends on who you’re asking. From the perspective of the outside clients – they are private. From the perspective of the peer elements they are public. Since they are public, they should be specified through TDD, and this is exactly how we specify the system’s architecture and design.

So, in a nutshell, the answer to the question – “do we test external or internal APIs” is yes.

We would love to hear from all of you on this question!

Posted on: February 11, 2021 04:02 AM | Permalink | Comments (0)

TDD and Asychronous Behavior: Part 1

In TDD we write tests as specifications where each of them is focused on a single behavior of the system.  A “good” test in TDD makes a single, unique distinction about the system.  

But this means when the TDD process is complete our spec also serves as a suite of tests against regression.  This is a hugely valuable side-effect of TDD.  We don’t write “tests” but we get tests too, and with no additional effort.  As tests we also want each of them to be unique in another sense: each has only one reason to fail.  Thus when a test fails we will know exactly why.

We say it this way:  “A given test tests everything which is in scope, but which the test does not control.”  Clearly we never want to test “everything in scope” but rather one unique thing at a time.  The implication is that everything other than that one thing must be controlled by the test.

This can include many things… framework objects, libraries, the user interface, the database, time, randomness, devices and sensors, the network, etc…  For many of these entities we can solve the problem with the endo-testing technique [1], but one aspect of development can pose a special problem: multi-threaded execution.


When a given behavior safely supports multiple threads it does so either because it is stateless/re-entrant, or it is using some mechanism to ensure mutual exclusion (“mutex”).  If the object is stateless or re-entrant then there are no thread-related issues to deal with in a test.  But if it is ensuring mutex as well then the mechanism it employs to do so is “in scope.”  When we are specifying the behavior of the object we don’t want to also specify the mutex-ensuring behavior in the same test.  Thus, we have to bring it under the control of the test.  This actually isn’t that hard as it might seem.  It’s a matter of technique.

Our first step is to separate the mutex behavior from the primary responsibility of the object.

Often objects will use thread locks in order to protect some functionality from being accessed by multiple threads simultaneously.  The problem is that these objects would be doing two things: providing the core functionality and managing the locks.  A given object should not be responsible for two things.  That’s a basic tenet of good design we call cohesion.  So the first step is to separate the two responsibilities, and one way to do this is by using a Synchronization Proxy.


If we use a Synchronization Proxy [2] we can separate these two responsibilities into two different objects.  This means that the primary object will now only provide its core functionality (meaning we can specify its behavior it in a straightforward, single-threaded way) and the Synchronization Proxy will ensure the mutex.  We’ll explain the proxy first, and then of course we have to discuss how to specify its behavior in its own test.

Because we want to focus on techniques for specifying/testing the proxy part of this, the main object (which we are calling Target) will have an extremely simple behavior.  It has a bit of state that can be changed by calling a “Set” method.

public class Target {

    public int x;

    public override void SetX(int xValue) { x = xValue; }


Obviously this is a trivial class, and easy to specify in a test.  Remember, we’re not concerning ourselves with this class, but rather the proxy’s behavior that is going to ensure that two threads cannot call SetX() simultaneously or in an overlapping way.  First, we create an abstraction for this Target:

public abstract class Target {

    public abstract void SetX(int xValue);



public class RealTarget: Target {

    public int x;

    public override void SetX(int xValue) { x = xValue; }


Now RealTarget is an implementation of the Target abstraction.  Any client object will see only Target.  In a single-threaded system it could use RealTarget directly, but if multiple threads are to be supported we need to do more.  In that case we create a second implementation, which is the Synchronization Proxy.

public class SimpleLockSynchronizationProxy : Target {

    private RealTarget myTarget;

    public SimpleLockSynchronizationProxy(RealTarget aTarget) { myTarget = aTarget; }

    public override void SetX(int xValue) {

        lock (this) {





Note that this proxy takes an instance of RealTarget it its constructor, and delegates to it for the core behavior (SetX()) which would be specified it its own (trivial) test. But the proxy takes the lock along the way, and thus guards against multiple, simultaneous, overlapping accesses. The proxy will not allow a thread to enter SetX() if another thread is currently executing it.

At run-time, we “string them together” and expose the Proxy to client objects, up-cast to Target. Note this make no difference to any client code. The question, therefore, is how we drive/specify/test that the proxy is, in fact, preventing the multiple access problem from occurring.

In part 2 we will deal with this issue. be continued...

[1] Endo-Testing will be covered in a future blog

[2] If you are not familiar with the Proxy Pattern, pay a visit here:

TDD and Asychronous Behavior: Part 1

TDD and Asychronous Behavior: Part 2

Posted on: February 11, 2021 03:52 AM | Permalink | Comments (0)

TDD and Asychronous Behavior: Part 2

In  part 1, we discussed the benefits of separating out the code that ensures mutex (in this case, using thread locks) from the code that provides core behavior using a Synchronization Proxy.  The core behavior can be tested in a straightforward, single-threaded way.  What remains in terms of TDD and asynchronous behavior is how to effectively specify/test the Synchronization Proxy.

Testing the Synchronization Proxy

You might be saying “the proxy class is so simple, I’m not sure I’d need to drive its behavior from a specification/test.  All it does is take the lock and delegate” The level of rigor in your specifications is always a judgment call, so we’ll set aside whether a given proxy behavior needs a test. We’re going to focus on how to write such a test it in the case where you wish to. In other words: if you decide not to include it in the specification we want it to be because you decided not to, not because you didn’t know how.[3]

The given-when-then layout of the specification would be something along these lines:


    Threads A and B are running
    Thread A is running code T


    Thread B attempts to run code T


    Thread B will wait until Thread A is done: the accesses with be serial, not parallel.

The key here is the word “until”. What the test needs to drive/specify/ensure is that the timing is right, that Thread B writes *after* Thread A even if Thread A takes a long time. Let’s look at an implementation sequence diagram.

Client A and Client B are inner classes of the test, created just to exercise the proxy, each in its own thread.  If the proxy did not add the synchronization behavior, the writes to Target would be 2, and then 1, because we tell the Target to wait 10 seconds before writing the state for Client A, but only 1 second for Client B.  If the proxy prevents this (proper behavior) then the writes will be 1, and then 2, because Client B couldn’t get the access until Client A was finished.

This is a partial solution, but it begs a few questions.

  1. How does the test get RealTarget to wait these different amounts of time?
  2. How does the test assert the sequence of these writes is 1, 2?
  3. If the RealTarget “waits 10 seconds” won’t the test execution be horribly slow?

The first two questions are answered by replacing RealTarget with a Mock Object[4]. Remember, we are not specifying RealTarget here, we are specifying the proxy’s behavior, therefore RealTarget must be controlled by the test. A mock allows this.

What about the time issue? Well, time is in scope and we certainly are not testing that time works. So we have to control it in the test as well.

Here’s the implementation sequence diagram with the mock object in place of RealTarget, and another object that replaces time.

Time is a simulator, which can be told by the test to “be” at any time we want. MockTarget basically calls on Time and says “let me know when we’ve reached or passed second x”. We use the Observer Pattern [5] to implement this. The first time the mock is called it will ask Time to notify it when 10 seconds have passed. The second time, it will ask for a 1 second notification. We do this with a simple conditional.

Furthermore MockTarget maintains a log of all calls made to it, in order, which the test can ask for to determine the sequence of setX() calls and assert that it is 1, 2 rather than 2, 1.

10 and 1 are not significant numbers, so as you’ll see in the code we made constants “longWait” and “shortWait” to be used in the test. It’s only important that the first thread waits longer than the second, and since time itself is being simulated anyway the “actual” lengths of time are unimportant. We can pretend they are one year and one hundred years if you want. It’s nice to control time. :)

MockTarget, Time, and the ClientA and ClientB objects are all part of the test, and so a good practice is to make them private inner classes of the test. Also the Observer interface and all constants used in this test are similarly part of the test itself. Remember, a test tests everything which is in scope but which the test does not control. The only thing not controlled by the test is the Synchronization Proxy.

[3] Perhaps later we’ll make our argument about whether you should or not. :)
[4] If you don't know about the Mock Object Pattern, visit this link:

[5] For more details on the Observer Pattern, visit this link:

TDD and Asychronous Behavior: Part 1

TDD and Asychronous Behavior: Part 2


Posted on: February 11, 2021 03:51 AM | Permalink | Comments (0)

"Men occasionally stumble over the truth, but most of them pick themselves up and hurry off as if nothing ever happened."

- Winston Churchill