Along the way, I'm chatting with lots of folks, and a remarkably large fraction lead off with, "Well, I've always been doing X, but I want to learn to code". (Last night's was a fellow who does financial compliance work for one of the large funds.) These folks are usually self-taught, and tend to be very self-deprecating about the fact that they didn't go to school so they don't *really* understand programming. A couple of the programmers I was with and I got chatting about that, and the fact that, yes, the best way to learn to program is by doing. A degree in CS is helpful, but mostly in that it teaches you some of the underlying theory for programming *well*; the nuts and bolts change so often that the details you learn in school will only be useful for a limited time anyway. Somewhere in there, I asserted that you could probably list all of the most-useful bits of theory and practice in one brief talk anyway.
So, here's a challenge: help me figure out what those are. What are the key engineering principles that *every* programmer should know, that probably aren't obvious to a newbie and which aren't necessarily going to be taught in an online "How to Java" class?
I'll start out with a few offhand:
Refactoring: great code doesn't usually come from a Beautiful Crystalline Vision that some programmer dreams up -- it comes from writing some code, getting it working, and then rearranging it to make the code *better* while it's still working. That's "refactoring": the art of making the code cleaner without changing what it's doing. It's a good habit to get into, especially because it takes practice. (Granted, listing all the major refactoring techniques is a good-sized talk itself; I highly recommend Fowler's book on the subject.)
The DRY (Don't Repeat Yourself) Principle: which I usually describe as "Duplication is the source of all evil". Any time you are duplicating code, you're making it much more likely that you'll get bugs when things change. Much of refactoring is about merging things to eliminate duplication. Similarly, duplicate data is prone to getting out of sync and causing problems, so you should usually try to point to the same data when it's convenient to do so.
Efficiency is good, but algorithmic complexity is what matters: this is what's often called "Big-O" notation in computer science. How fast things run *does* matter, but only in the grand scheme of things. Whether this approach takes twice as long as that one probably doesn't matter unless you're doing it a bazillion times per second. What *does* tend to matter, given a list of size n, is whether you're going through it just once -- O(n) in the notation -- or whether each time through you're going through the whole list again -- O(n^2) in the notation, that is, "n-squared". (You'd be surprised how easy it to to wind up with algorithms that are n^2 or even n^3 -- that can actually get slow.) Or, if you have two list m and n, does your approach take O(n+m) time, or O(n*m)? It's worth practicing thinking through these order-of-magnitude evaluations and getting an intuition for it. That said...
Big stuff swamps small stuff: in one community the other day, I pointed out an approach to solving a problem that involved creating an extra object for each HTTP call. One of the folks in the discussion asked whether that inefficiency would matter, and I had to point out that you're already handling an HTTP call -- at *best*, the overhead of that handler is at least 1000 times that extra object creation, quite likely 10000 times more, so this is a drop in the bucket. So keep scale in mind, and don't sweat the small stuff. If you know your list is never going to have more than ten entries, even O(n^3) probably doesn't matter much.