Justin du Coeur (jducoeur) wrote,
Justin du Coeur

The First Law of Programming, Part 1: Duplicate Code is Evil

So my rant yesterday about good and bad programmers did leave me musing about an important corollary question: how do you make good programmers? The answer is obviously complex, but here's a starting point: teach them The First Law of Programming, which is:
Duplication Is Evil
Really, that's it -- one nice simple sentence, with huge ramifications.

The odd part is how ill-taught this rule is. Most programming courses teach it as an afterthought, if at all, which is strange because it motivates so much of the structure of programming. I mean, the evolution of computer languages has mostly been about finding higher and higher-level ways to eliminate duplication in code, and many language features are all about ways to remove duplication. For example:
  • If you find the same expression being used in multiple places in the program -- even if it is just one complex line -- it most often makes sense to lift that out into its own parameterized function or method.

  • If you have the same basic functional pattern being used repeatedly -- that is, when you can comfortably say, "This is just doing the same thing as that except for X" -- then you probably want to lift out a higher-order function, encapsulating X as a functional parameter or in a closure.

  • If you have multiple classes that are doing essentially the same things, except *to* different types -- for instance, a List of integers vs. a List of strings vs. a List of Customers -- then you almost certainly want a Generic class.

  • If you have multiple classes that are trying to do the same tidbit of functionality, then you probably want a trait or a mixin. (Or if you are trapped in single-inheritance land, at least change the way you're aggregating those functions.)
And so on. While not every programming-language feature is about removing duplication, many are, and for good reason.

Mind, I am not advocating removing duplication for the usual squishy reasons like "reuse". (Itself a source of many sins, because it misses the fact that *sometimes*, it really is much cheaper, easier and more reliable to reinvent the wheel.) The real reason is much simpler: Duplication Causes Bugs. Period. And I don't mean occasionally: in my experience, *most* serious programming bugs trace back to duplication in one way or another. Sometimes it is because duplicated code makes the code bulkier and harder to reason about. Frequently, it is because you copied this code into four places, tweaked it in one of them, and forgot to tweak it in the others. Most often, the duplication is simply a symptom of the fact that you don't really understand the abstractions in your code.

So if you are learning programming, I commend to you this rule. Whenver you notice *any* kind of duplication, ask why, and really dig into whether those duplicates should be combined. While it is technically possible to carry it too far, it's really pretty difficult to do so -- the exceptions are at least a bit unusual. And continually saying to yourself, "Surely there must be *some* way to remove this duplication" will force you to think in ways that will teach you a huge amount about why modern programming languages work the way they do, and why you want to use those fancy constructs.

To give you a leg up, I'll point you specifically to the little-known bible of programming: Refactoring: Improving the Design of Existing Code, by Martin Fowler. Fowler can be a bit of a loon (albeit glorious fun to listen to), but he's a brilliant loon and one of the more insightful thinkers about the art of programming. This book, in particular, is the one I hand to *every* intermediate-level engineer. It starts with a fairly modest section on how to think about the structure of code, and then spends the rest of the book on an encyclopedia of "code smells" and how to fix them. Tom Leonard insisted that I read it back when I was working for him, a dozen or so years ago, and of all the things I learned from Tom, this was probably the single most valuable. It isn't quite perfect -- it is very Java-centric, so misses lots of functional-programming options available in more modern languages, and it is very focused on fixing existing code. But it'll teach you a lot about how to *think* about code properly.

As the title says, this is just part 1. When I have some time (possibly later today, but we'll see), I'll get into Part 2: Duplicate Data is Evil...
Tags: programming

  • Post a new comment


    Anonymous comments are disabled in this journal

    default userpic

    Your reply will be screened

    Your IP address will be recorded