Sunday, June 21, 2009

TDD - The Complicated Bits

For the last couple of months I’ve been working on a new parser for Tycho. The new parser has a couple of new interesting features, it fixes a few embarrassing inefficiencies and it’s C# clean (the old one was written in C++.)

I wrote the new parser completely test-driven. This means I didn’t add a new feature until I’d written a test that required it – standard TDD practice. This is to date, my largest fully TDD project, so I’ve learned a lot!

First of all I love test driven development. This is not an uncommon experience, most of the developers I know also love TDD so I’m not that surprised. My code had far fewer bugs, I spend next to no time in the debugger, and my code is in much better shape: much smaller classes, smaller methods and generally easier to understand. (Honestly, I’ve just deleted several paragraphs waxing wicked on how cool TDD is, I can’t emphasise this stuff enough!)

But for now I’m going to talk about the more complicated aspects of TDD.

It was especially around adding big new functionality where things got tricky. If I didn’t need to make huge architectural changes to the system it would be easy, in fact, TDD was completely supportive. I would write a test or two that required the new functionality and update the code to make them pass. But for features that required system wide changes, the tests quickly got in the way. This happens on a number of levels, first, simple interface changes like adding a parameter to a commonly used interface would mean going over all the tests that used that interface. Not difficult but one of those robotic tasks that you just need to get beyond before the compiler says “all good” again.

Deep architectural changes would require a lot more work. This would involve anything between simply rewiring test logic, all the way to deleting and rewriting whole swathes of tests. These architectural changes would come about because you’d forgotten something important that affects lots of stuff, or you’d have one of those DDD-style “domain insights” where you end up completely changing the meaning of something, or even lots of things. Both would require large-scale changes to your tests, and I’d find myself hacking away on things without having recompiled or retested way too long. In those situations you really need to take a step back and ask yourself how you’re going to implement this bit by bit instead of going AWOL until things work again.

Refactorings were also frustrating. I’d notice a nice refactoring, almost jump in to start work on it before I’d realise that I would need to rewrite tests before I could continue. One of the main mantras of TDD is “red, green, refactor”. Write a test that fails (red), hack the code until it passes (green), then go back and clean up the mess (refactor). In this context, you can refactor safely because you know you’ve got tests to back you up. But what if a refactoring is so large it breaks all your tests? Your tests are too dependent on the implementation! Of course! You need to delete your tests and start again, but which tests? Large refactorings like this are easy to think about when you don’t have tests, they become second nature, but I still haven’t fully got my head around coordinating these large scale (test-breaking) refactorings with a test-driven approach.

This brings me to an interesting trade-off with unit testing in general. By testing all your code at the lowest level, your tests can be small and simple. Your objects become small and simple too as a result. But these low-level tests are bound to these low-level implementation details. As soon as you want to change them, you know, when you make one of those lovely “domain insights”, they’ll tie you down until you’re wretched.

The other option is to try, as much as possible, to write tests at a higher level. You’ll test dozens of components at the same time, all integrated, all working together. These sorts of tests can be very effective because you’re testing the components as they’re meant to work together, much more like a real environment than mere unit tests. These tests can be pretty crazy though, and depending on what you’re writing, they’ll need to cover lots of corner cases. Managing the variety of input data, the shear combinatorial explosion of corner cases and loads of moving parts will eventually wear you down, you’ll miss test cases and you’ll take shortcuts. Integration tests like these are rarely elegant. So which do you choose?

I remind myself of two “rules” or “principles” when thinking about this. Rule number one is, choose the smallest high-level interface to test. The “smallest” interface is the interface that exposes the least moving parts, has the least number of corner cases and is usually the easiest to understand. A facade object is often a good candidate. If you’re writing a console app, you may want to test the actual input/output of the console if it’s simple enough. The term I like to use is “surface area”: components have a surface area, which is the breadth of the complexity they expose to the outside world. Generally, good object design will try to reduce this surface area, the corollary with testing is to test objects with the smallest surface area. You want to test high level too, the higher level your tests are, the less they’ll be dependent on the implementation details, and the easier it will be to change stuff in future.

And if you’re still confused, the second rule is to simply test everything. Even if you do find a nice small high-level interface, you’re probably going to want to use TDD to implement the hideous complexity inside it. And if you’re being a good TDD dev, there won’t be too much hideous complexity when you’re done. The worst that can happen is that you delete all those implementation-bound unit tests and start again, at least you’ve still got a nice integration test to fall back on.

I started out without knowing rule number 2 so I found myself very focussed on the small. This had its advantages and its disadvantages. I could concentrate intently on getting just one object to work correctly within its test harness. When I’d finished that component I would get green lights all over the place and I’d naturally think I was done, finished. I could go home, but, wait, there’s more stuff to do? I wasn’t thinking about the big picture, about how this whole thing would work together as one big system. Without TDD I would design top-down, trying to build something that kind of works and expand on it (kind of "tracer bullet" style.) With TDD it was the opposite, I would figure out the smallest component I needed and begin there. Both approaches are helpful so in retrospect I should have written a few high-level tests that tested the intended functionality of the whole system, end to end. These tests would cover a small amount of functionality but I'd be making real high-level progress from well tested low-level components. I think this is the philosophy behind BDD.

In the end I was very impressed with TDD. Having loads of tests is an indispensable asset. I can make changes to the code base and have confidence that I haven’t broken anything. The system I built was reasonably complicated (it’s a scanner-less PEG packrat parser, with full direct and indirect left-recursion support, and(!) it’s dynamic so new syntax can be loaded from the source code it’s parsing.) At times I thought my progress was slow, but I had to remind myself that the whole thing is complicated so it’s going to be slow however I do it, and that without TDD I’d be spending whole days just living inside the debugger. And not for the first time. :)

2 comments:

Tim said...

Hi Tim, I too found that having many tests covering small units of code caused friction when refactoring, especially as the number of tests started to increase. Over the past several months, I been using a BDD approach. Instead of writing fine-grained tests for every unit of code, I tend to write more coarse-grained tests that cover performing an action within a certain context. BDD is an outside-in approach, so you only implement behaviour that a higher-level function requires. This means that when you refactor, the tests should not fail unless the behaviour has changed. I have found tests written using BDD are easier to refactor, better focused and more maintainable.

Tim Macfarlane said...

Yeah, I suspect this is the right approach. With Tycho I've covered most (if not all) functionality with very high-level script tests: given some tycho code, test that its output is correct. Fortunately Tycho's syntax is more or less stable now so this is a good place to put tests. I've been able to make quite large refactorings without requiring too much test movement.

On other projects I've not been so lucky. High level tests become very complicated very quickly and trying to cover the sheer breadth of the functionality has been hard work. I'll find myself building a very specialized framework that I can use to express all the scenarios in, which seems to help out a lot. Refactoring unit tests is a must here.

I especially like the "outside-in" approach of BDD, should keep you focussed on the ultimate goal of the software. I'll be doing a lot more of this in future, that's for sure.

Cheers Tim (must catch up for a beer sometime!)