?

Log in

No account? Create an account
Previous Entry Share Next Entry
The First Law of Programming, Part 1: Duplicate Code is Evil
device
jducoeur
So my rant yesterday about good and bad programmers did leave me musing about an important corollary question: how do you make good programmers? The answer is obviously complex, but here's a starting point: teach them The First Law of Programming, which is:
Duplication Is Evil
Really, that's it -- one nice simple sentence, with huge ramifications.

The odd part is how ill-taught this rule is. Most programming courses teach it as an afterthought, if at all, which is strange because it motivates so much of the structure of programming. I mean, the evolution of computer languages has mostly been about finding higher and higher-level ways to eliminate duplication in code, and many language features are all about ways to remove duplication. For example:
  • If you find the same expression being used in multiple places in the program -- even if it is just one complex line -- it most often makes sense to lift that out into its own parameterized function or method.

  • If you have the same basic functional pattern being used repeatedly -- that is, when you can comfortably say, "This is just doing the same thing as that except for X" -- then you probably want to lift out a higher-order function, encapsulating X as a functional parameter or in a closure.

  • If you have multiple classes that are doing essentially the same things, except *to* different types -- for instance, a List of integers vs. a List of strings vs. a List of Customers -- then you almost certainly want a Generic class.

  • If you have multiple classes that are trying to do the same tidbit of functionality, then you probably want a trait or a mixin. (Or if you are trapped in single-inheritance land, at least change the way you're aggregating those functions.)
And so on. While not every programming-language feature is about removing duplication, many are, and for good reason.

Mind, I am not advocating removing duplication for the usual squishy reasons like "reuse". (Itself a source of many sins, because it misses the fact that *sometimes*, it really is much cheaper, easier and more reliable to reinvent the wheel.) The real reason is much simpler: Duplication Causes Bugs. Period. And I don't mean occasionally: in my experience, *most* serious programming bugs trace back to duplication in one way or another. Sometimes it is because duplicated code makes the code bulkier and harder to reason about. Frequently, it is because you copied this code into four places, tweaked it in one of them, and forgot to tweak it in the others. Most often, the duplication is simply a symptom of the fact that you don't really understand the abstractions in your code.

So if you are learning programming, I commend to you this rule. Whenver you notice *any* kind of duplication, ask why, and really dig into whether those duplicates should be combined. While it is technically possible to carry it too far, it's really pretty difficult to do so -- the exceptions are at least a bit unusual. And continually saying to yourself, "Surely there must be *some* way to remove this duplication" will force you to think in ways that will teach you a huge amount about why modern programming languages work the way they do, and why you want to use those fancy constructs.

To give you a leg up, I'll point you specifically to the little-known bible of programming: Refactoring: Improving the Design of Existing Code, by Martin Fowler. Fowler can be a bit of a loon (albeit glorious fun to listen to), but he's a brilliant loon and one of the more insightful thinkers about the art of programming. This book, in particular, is the one I hand to *every* intermediate-level engineer. It starts with a fairly modest section on how to think about the structure of code, and then spends the rest of the book on an encyclopedia of "code smells" and how to fix them. Tom Leonard insisted that I read it back when I was working for him, a dozen or so years ago, and of all the things I learned from Tom, this was probably the single most valuable. It isn't quite perfect -- it is very Java-centric, so misses lots of functional-programming options available in more modern languages, and it is very focused on fixing existing code. But it'll teach you a lot about how to *think* about code properly.

As the title says, this is just part 1. When I have some time (possibly later today, but we'll see), I'll get into Part 2: Duplicate Data is Evil...

  • 1
And again, you have hit on something so fundimental that it actually spans the genre's from programming to business (ie - non-programming).

Duplication is evil.

This is typically described in "the office" world that I am so accustomed to as "Don't do double work." Overlapping. Two people doing the same task to achieve the same goal... or not... two people simply doing the same task and not knowing that it's being done. It slows down productivity on a whole bunch of levels. Then it breeds apathy. 'Why should I do this if someone else is anyway?'

This is something I talk about in interviews. If a problem is recurring (duplicating) then a process should be put in place to correct the problem at the core. At which point, other duplications... and issues (hey, 'Terry' did that same solution and now these three people are doing it, but not the rest of the team) can come to light. Yay! Problems. (Not being sarcastic.) Now that problems have been identified, processes can be put into place, so that duplications can cease and productivity can increase.

If you ever burn out on programming, might I suggest a 2nd career in HR?

Re: Programming & Business

Actually, while you're right that the same effect occurs in business, the heart of the problem is a bit different.

The problem with duplication in code is *not* wasted effort. People often think this (in particular, managers often think this), but that's actually a minor problem. The *big* problem is that, when you have three bits of code trying to say the same thing, they often get out of synch with each other. *That* can cause all sorts of havoc, and often does.

So the better business cognate to what I'm saying here is when you have the process distributed in such a way that three different people essentially have the authority to make the same decision. If they all consistently come to the same conclusion, that works okay -- but as soon as they disagree (and they will), chaos ensues.

In other words, while it's true that "Don't do double work" overlaps a bit with what I'm saying, "Don't mix up the decision-making authority" is closer to the heart of it...

Re: Programming & Business

And this - the reading the same thing and getting two different, yet slightly overlapping ideas out of it - is what happens when managerial vs secretarial brains look at it. In a good company, both would work together to make the team stronger. But, that's a different conversation entirely.

Very nice - I look forward to the continuation!

Duplication Is Evil

Really, that's it -- one nice simple sentence, with huge ramifications.


Ha. That was strongly in mind just last night, while signing 120+ pages of paperwork for a refinance. (*There's* something that desperately needs refactoring.)

While it is technically possible to carry it too far, it's really pretty difficult to do so -- the exceptions are at least a bit unusual.

"Too far" may be rare, but "badly" is also a pitfall of the overzealous. A perhaps-canonical example: the method which does one of seventeen different things depending on which flags get passed in, those things being related only by the fact that pieces of their internal logic overlap. All done in the name of avoiding duplication, much like the Inquisition was done in the name of promoting faith and virtue.

...and it is very focused on fixing existing code.

I'd call that a point in its favor - maintaining and modifying an existing codebase is far more prevalent in industry than in your average comp sci degree program; formally passed-down knowledge on this skill is a good thing to have around.

That was strongly in mind just last night, while signing 120+ pages of paperwork for a refinance.

Things for me to look forward to. (My closing is scheduled for Tuesday. Pain in the ass, but the savings will be very nice.)

I will admit to being slightly surprised that you're refinancing this soon after the purchase, though -- significant interest rate drop since you bought?

"Too far" may be rare, but "badly" is also a pitfall of the overzealous.

True. Fortunately, Fowler's book also lists this sort of thing among its code smells, and recommends ways to fix it. (I'm not really getting into cohesion as a principle here, but it's implicitly crucial in the Refactoring book.)

I'd call that a point in its favor

Oh, sure. I suppose my point is better expressed that, while this is focused on fixing existing code, much of it is highly applicable to writing new stuff as well...


That was strongly in mind just last night, while signing 120+ pages of paperwork for a refinance.

Things for me to look forward to. (My closing is scheduled for Tuesday. Pain in the ass, but the savings will be very nice.)


My closing was scheduled for Oct. 17, but one business day earlier they discovered a problem I had told them about two months earlier, so they rescheduled it for Oct. 27. On Oct. 27, two hours before closing, I was told that they hadn't found a solution to the aforementioned problem, and closing was therefore canceled.

But to get back to your point... the previous time I tried to refinance, the deal fell through precisely because of Evil Duplication. Two mortgage company employees each had half of my dossier of paperwork, and each was waiting for me to send in the other half before they could proceed. If the responsibility had been in a single point of control, this would have been discovered and fixed much sooner.

I will admit to being slightly surprised that you're refinancing this soon after the purchase, though -- significant interest rate drop since you bought?

Not large, but not trivial: 3/8 of a percent lower, paying very little cash to do so. Naive payback time (ignoring mortgage interest deduction, future value of money, etc) is about 2-3 years, and we're planning on being here at least 10, if not 15-20, so the math makes sense.

(It could have been even better to pay points and get an even more ridiculously good rate, but we're hurting for liquidity after the renovations, and - strangely - at the time we refinanced, points wouldn't have lowered the rate a huge amount, making the marginal benefit slim.)

Most programming courses teach it as an afterthought, if at all, which is strange because it motivates so much of the structure of programming. I mean, the evolution of computer languages has mostly been about finding higher and higher-level ways to eliminate duplication in code, and many language features are all about ways to remove duplication.

I certainly try to make this point in class. "When you find yourself writing the same thing over and over, you're doing something wrong." I say this in introducing variables, again in introducing functions, again in introducing higher-order functions, again in introducing inheritance....

<looks innocent> And again when introducing macros?

Seriously - sometimes yes. When I was doing C++, macros were a crucial tool for doing it well...

Sure, I would, if I ever got to teach macros. I teach Scheme to non-majors who are lucky if they get to HOF's in a semester. I teach Scheme to CS majors, but only for a third of a semester, so at best I get to mention the existence of macros. I teach C++ for more than half a semester, but there are so many difficult and necessary things to cover in C++ that I don't get to parameterized macros.



Edited at 2011-11-05 10:03 pm (UTC)

In the C++ case, I wound up using macros from the get go. Both companies were using more or less the same library, a highly refined homebrew version of COM, and a large body of macros were needed to make that work well...

Itself a source of many sins, because it misses the fact that *sometimes*, it really is much cheaper, easier and more reliable to reinvent the wheel.

Reminds me of one of my best quips from a past job where we were designing a large class library: "It's better to reinvent the wheel than to subclass a square wheel and try to make it round." :-)

Yeah, one of my mantras in recent years has become, "Systems integration is harder than you think." At every job, I have to spend at least some time convincing my boss that build is sometimes easier than buy...

I couldn't agree more. I have recently encountered a number of head-scratching cases of seemingly bright people inventing systems where every new feature is forced to rewrite 90% the same functionality, presumably because they've learned lots of lessons along the way but not this one.

Today, for instance, we were presented with the flexible new system that is supposed to replace some ad hoc perl reports - now instead of writing a new 50 line script for each query, we only need to create a new DB table, DAO, service class, model, view, controller, and command-line application...

  • 1