My nephew just introduced me to geek&poke, a web-comic that only true geeks could love. Needless to say, I’m lovin’ it. 😉
While scanning through the recent strips, I ran across this one, which gave me a chuckle, and also got me thinking.
I’ve heard a lot of opinions on comments in programs over the last thirty years of developing software. They seem to be divided in to three camps. The first consists primarily of relatively new self-taught programmers, who think that comments (like documentation) are boring, unnecessary, and completely optional — after all, they understand their code, anyone wanting to use it should figure it out at a glance too! This type generally learns the folly of his ways the third or fourth time he has to go back and try to modify code that he wrote more than six months ago, and realizes to his dismay that without having the reasons why he did things that way fresh in his mind, he can’t understand it either. Unfortunately the vast majority of open-source developers are in this group.
The second is solely-college-educated developers, the ones who’d never even heard of a compiler until it was taught to them in a class. That type goes to the other extreme: every two-line function has to have its own twenty- or thirty-line comment header, describing when it was written, why it was needed, who wrote it, who approved it, who code-reviewed it, what each and every parameter is and is used for, what return values you can expect from it, and anything and everything else. This type either learns that all those comments are superfluous (and eventually graduates into the third camp), or has no real skill or interest in coding and just wants to get promoted to management as quickly as possible, where he forces everyone under him to adhere to that standard (because he has to read their work, and reading comments is soooo much easier than actually reading code — which is, I suspect, also the reason that college professors require that).
The third camp is where developers end up after lots of experience, after they realize that the code should be self-documenting. They pick their variable and function names in such a way that they don’t need documentation, anyone reading them can easily see what they’re used for. For these people, well-chosen names are the what, well-designed code is the how, and comments are reserved for the why — the reasons for things that can’t be determined from the code. As a very simple example, here’s an actual function copied directly from one of my programs:
bool isValidGregorianDate(int year, int month, int day) {
if (year == 0) return false; // 1 BCE is followed by 1 CE.
if (month < 1 || month > 12) return false;
if (day < 1 || day > highestValidDay(year, month)) return false;
// Due to Pope Gregory's reform in 1582, ten days are missing from the
// Gregorian calendar, as compared to the Julian calendar in use before
// that. There's no way to properly calculate which days unless you also
// know the location, because different places adopted the change in
// different years. All you can say for certain is that somewhere
// between ten and thirteen days get skipped somewhere between 1582 and
// 1923. This function will assumes that those changes happened in 1582,
// so that Thursday, 4 October 1582 on the Julian calendar was followed
// by Friday, 15 October 1582 on the Gregorian.
if (year == 1582 && month == 10 && day > 4 && day < 15) return false;
return true;
}
First, notice the function name, isValidGregorianDate
. If you were reading through the code and saw a function with that name, it would be pretty bloody obvious what the purpose of it is, and any comments to that effect would represent valuable time wasted. Ditto for the parameters year
, month
, and day
— between the names themselves and the context of the function name, their purpose is self-evident. If you want to know the valid ranges, they’re pretty obvious from the code itself, just in case you weren’t familiar with the Gregorian calendar.
Next, notice the first line within the function. If it were uncommented you might wonder why year zero isn’t accepted by the function, but a tiny little comment directly after it explains it — there is no year zero in the Gregorian calendar, and here’s why. Also notice the second and third lines, which have no comments because they’re self-explanatory, thanks in great part to a call to another self-descriptive function, highestValidDay
(which calls another self-descriptive function, isLeapYear
). An inexperienced developer would see that these functions would never be called from anywhere else and put the code for them right into isValidGregorianDate
, with or without comments to explain what it’s doing and why, but that would be a distraction from the purpose of isValidGregorianDate
. By separating them and giving them very descriptive names, none of that is necessary.
Finally, there’s the huge nine-line comment, which describes the reason for the one line of code immediately following it. Lots of care was lavished on the description for that one line, because a reader who isn’t intimately familiar with the history of the Gregorian calendar wouldn’t know that stuff — and that reader just might be the developer who wrote the code, some time later when he’s forgotten all of the details.
Out of curiosity, I scanned a dozen source-code files from my current project, counting the number of lines and the number of true comments (skipping commented-out code lines and TODO comments which are meant to catch my attention when I go searching for things that still need work). Out of 3,645 lines in those twelve files, there are 87 comments, mostly little one-liners tacked onto the end of the line they’re commenting (like “so it will call Update” or “otherwise it’s constant”), but occasionally detailed four- or six- or ten-line comments like the one shown in the example above. I don’t know how that measures up to anyone else’s code, but it’s fairly average for my own.
If you’re a developer, you’re probably wondering just how I could possibly know what lines will need comments when someone else looks at the function. The answer is easy enough: as I’m developing the code, I also read it as if I’m seeing it for the first time, and if any questions occur to that side of my brain, I add a comment to answer them. It’s something you pick up with enough experience, and I don’t know of any shortcut to it except knowing what you’re aiming for.
The basic thing to remember is that you’re not writing code for the compiler, you’re writing it for the human readers who comes after you. The compiler can understand literally millions of variations on a single function. Practically all of those would be gibberish to a human trying to debug or modify the function. If you need any more encouragement, keep in mind that that later someone might be you.