Comments in Code

My nephew just introduced me to geek&poke, a web-comic that only true geeks could love. Needless to say, I’m lovin’ it. 😉

While scanning through the recent strips, I ran across this one, which gave me a chuckle, and also got me thinking.

I’ve heard a lot of opinions on comments in programs over the last thirty years of developing software. They seem to be divided in to three camps. The first consists primarily of relatively new self-taught programmers, who think that comments (like documentation) are boring, unnecessary, and completely optional — after all, they understand their code, anyone wanting to use it should figure it out at a glance too! This type generally learns the folly of his ways the third or fourth time he has to go back and try to modify code that he wrote more than six months ago, and realizes to his dismay that without having the reasons why he did things that way fresh in his mind, he can’t understand it either. Unfortunately the vast majority of open-source developers are in this group.

The second is solely-college-educated developers, the ones who’d never even heard of a compiler until it was taught to them in a class. That type goes to the other extreme: every two-line function has to have its own twenty- or thirty-line comment header, describing when it was written, why it was needed, who wrote it, who approved it, who code-reviewed it, what each and every parameter is and is used for, what return values you can expect from it, and anything and everything else. This type either learns that all those comments are superfluous (and eventually graduates into the third camp), or has no real skill or interest in coding and just wants to get promoted to management as quickly as possible, where he forces everyone under him to adhere to that standard (because he has to read their work, and reading comments is soooo much easier than actually reading code — which is, I suspect, also the reason that college professors require that).

The third camp is where developers end up after lots of experience, after they realize that the code should be self-documenting. They pick their variable and function names in such a way that they don’t need documentation, anyone reading them can easily see what they’re used for. For these people, well-chosen names are the what, well-designed code is the how, and comments are reserved for the why — the reasons for things that can’t be determined from the code. As a very simple example, here’s an actual function copied directly from one of my programs:

bool isValidGregorianDate(int year, int month, int day) {
    if (year == 0) return false; // 1 BCE is followed by 1 CE.
    if (month < 1 || month > 12) return false;
    if (day < 1 || day > highestValidDay(year, month)) return false;

    // Due to Pope Gregory's reform in 1582, ten days are missing from the
    // Gregorian calendar, as compared to the Julian calendar in use before
    // that. There's no way to properly calculate which days unless you also
    // know the location, because different places adopted the change in
    // different years. All you can say for certain is that somewhere
    // between ten and thirteen days get skipped somewhere between 1582 and
    // 1923. This function will assumes that those changes happened in 1582,
    // so that Thursday, 4 October 1582 on the Julian calendar was followed
    // by Friday, 15 October 1582 on the Gregorian.
    if (year == 1582 && month == 10 && day > 4 && day < 15) return false;
    return true;
}

First, notice the function name, isValidGregorianDate. If you were reading through the code and saw a function with that name, it would be pretty bloody obvious what the purpose of it is, and any comments to that effect would represent valuable time wasted. Ditto for the parameters year, month, and day — between the names themselves and the context of the function name, their purpose is self-evident. If you want to know the valid ranges, they’re pretty obvious from the code itself, just in case you weren’t familiar with the Gregorian calendar.

Next, notice the first line within the function. If it were uncommented you might wonder why year zero isn’t accepted by the function, but a tiny little comment directly after it explains it — there is no year zero in the Gregorian calendar, and here’s why. Also notice the second and third lines, which have no comments because they’re self-explanatory, thanks in great part to a call to another self-descriptive function, highestValidDay (which calls another self-descriptive function, isLeapYear). An inexperienced developer would see that these functions would never be called from anywhere else and put the code for them right into isValidGregorianDate, with or without comments to explain what it’s doing and why, but that would be a distraction from the purpose of isValidGregorianDate. By separating them and giving them very descriptive names, none of that is necessary.

Finally, there’s the huge nine-line comment, which describes the reason for the one line of code immediately following it. Lots of care was lavished on the description for that one line, because a reader who isn’t intimately familiar with the history of the Gregorian calendar wouldn’t know that stuff — and that reader just might be the developer who wrote the code, some time later when he’s forgotten all of the details.

Out of curiosity, I scanned a dozen source-code files from my current project, counting the number of lines and the number of true comments (skipping commented-out code lines and TODO comments which are meant to catch my attention when I go searching for things that still need work). Out of 3,645 lines in those twelve files, there are 87 comments, mostly little one-liners tacked onto the end of the line they’re commenting (like “so it will call Update” or “otherwise it’s constant”), but occasionally detailed four- or six- or ten-line comments like the one shown in the example above. I don’t know how that measures up to anyone else’s code, but it’s fairly average for my own.

If you’re a developer, you’re probably wondering just how I could possibly know what lines will need comments when someone else looks at the function. The answer is easy enough: as I’m developing the code, I also read it as if I’m seeing it for the first time, and if any questions occur to that side of my brain, I add a comment to answer them. It’s something you pick up with enough experience, and I don’t know of any shortcut to it except knowing what you’re aiming for.

The basic thing to remember is that you’re not writing code for the compiler, you’re writing it for the human readers who comes after you. The compiler can understand literally millions of variations on a single function. Practically all of those would be gibberish to a human trying to debug or modify the function. If you need any more encouragement, keep in mind that that later someone might be you.

8 Comments

  1. I notice from your code that the Gregorian calendar has no number zero, does that mean the xtian G-d wrote the universe in Paschal? Maybe that’s true, I wager. 😉

    The Jewish calendar avoids the problem of the “missing year zero” in a very simple way, it is dated from when the Adam the First was created, according to the calculations of the ancient, detailed, and systematic (and amazingly consistent, as Rashi shows when he uses the same system a few places in his commentary) chronology (http://en.wikipedia.org/wiki/Seder_Olam_Rabbah) Seder Olam and other sources in Jewish tradition, and there’s no time measured in the calendar before that. There’s still no year zero, but there’s no BCE either, which still leads one to guess that G-d created the universe in a Wirth-y language. 😉

    • The Gregorian calendar has no year zero because the guys who came up with it were based in Rome, and everyone knows that there’s no zero in Roman numerals. 😉

      • The Julian was made in Rome, though it originally had year 1 as Caesar’s appointment as emperor or something rather than the birth of the xtian deity. So was Gregorian, as the Pope was in Rome unless you count the ones in France during the schism (was that going on then? At any rate, Pope Gregory was in Rome), but I think by 1582 Europe knew of Hindu-Arabic numbers via the Crusades and the recently destroyed Muslim Spain, and used them for mathematics and science which was just beginning to flourish again at that time in Europe after the dark ages.

        However, like many software systems, the lack of a 0 in the Julian calendar persisted in version 2.0 of that calendar, due to cruft. 😉

        There’s no 0 in the Hebrew system of numbers either really, though this is not a problem for the Jewish calendar as the calendar begins with year one without any of this before and after business, though some events such as the creation of the universe, etc happened before the years are numbered.

        To add to the confusion, the years of the calendar begin with Rosh Hashanna, in Tishrei (lunar month in September or October) while the months are reckoned beginning with Nisan, which must always fall in the spring. (And, to get back to the topic of calendars, is set to Springtime periodically via a leap-month. This sounds pretty bad, since we’re used to a mere leap-day in the Julian and Gregorian calendars, but there’s actually less drift to this system than the Julian calendar, it hasn’t appreciably drifted season-wise during its entire history due to the fact that 19 solar years divide quite nicely into lunar months in a nearly exact way, which determine when a leap month is added in the calendar calculations set forth by Rabbi Hillel the 2nd during the late Roman empire era in Israel.

        (Prior to that, the leap month additions were determined by direct observation of the moon and stars, which was surprisingly rigorous since at least the Babelonians, near the beginning of the history of the Jewish people.

        As long as you didn’t need a telescope to see it, pretty much everything in the sky needed to do an accurate luni-solar calendar was well known to scholars. The main reason for moving to a pre-calculated luni-solar calendar was that messengers for the Sanhedrin’s decision on the calendar year (it had to be centralized like that, as the consequences for celebrating a holiday at the wrong date would be religiously problematic, i.e. violation of resting on the festival) were being interfered with or even killed by various heretics (especially the Sadducees) who had an interest in holidays coming out on days that accorded with their interpretations rather than in their actual dates in the calendar.

        (Due to the Saduceean interpretation of the counting of the omer beginning on Sunday rather than the day after Passover – the Torah refers to the latter as the “morrow of the day of rest”, and the Sadducees, being literalists unlike the Rabbinic Jews, decided that meant the day after Shabbos, i.e. Sunday, and because they were nice and pig-headed, they wanted everyone else to do according to their reckoning whether they liked it or not – and the best way to achieve that would be to mess with the calendar, rather than merely celebrate the counting of the omer differently than everyone else – which it would have been kind of nice to do if interfaith dialogue had been invented back 1,800 years ago or so, rather than interfering with messengers of the court) Because of those disruptions, it became impractical or even dangerous (highways didn’t have state troopers at the time 😉 ) to maintain the traditional system.

        I hope I’m not boring you, one of the few benefits of having a 3,000 year history of a religion with a lot of persecution and other exciting stuff happening to it is that there’s a lot of history to talk about. 😉

        • Yes, I’m afraid I have little interest in Jewish history, except where it directly affects the living — and before you argue that all of Jewish history affects the living, I say very firmly “not in my book.” 🙂

          • Well, this in particular affects me because we use the Jewish calendar to determine holidays, and it’s kind of nice to know the “why” behind what you do. Plus, the Jewish culture does continue to contribute to the world, and is half of the mix of Greek and Jewish cultures that produced western civilization.

            Not in particular our luni-solar calendar though, though I find some of the astronomy that’s used for determining the calendar interesting as a science geek. (It’s a lot more complex than just the leap month, there’s various “tweaks” of longer and shorter months that make it even more accurate over time, and regulations so that holidays don’t fall out on days of the week that are inconvenient that have to be accounted for.)

Comments are closed.