Lisp Practice Revisited

Last weekend I ported some MD5 calculation code to Lisp, for practice. But once it was done, I wasn’t happy with it, so I’ve been improving it since.

Whenever you go to refactor code, the first change should of course be to add an automated testing function — that way, you can easily confirm that it works after every change, catching potential bugs pretty much as soon as they appear. The test-md5 function, at the bottom of the file, takes care of this. (It’s not exported, but it doesn’t really need to be, you can call (oc-utils:md5 :test) if you need to access it externally.) I probably should have written the test function first, before even starting on the md5 code itself, but I wasn’t aware of that until this week.

Most of the changes in the new code (which you can find here) are the result of studying three Lisp style guides (here, here, and here) I found on the ‘net. (That third one is a PostScript file, which Windows users will probably need to install GhostScript and GSView to see. Linux systems generally have a PostScript viewer already installed. You can also try Google’s “View As Text” option for it instead, there isn’t really any non-textual information there.)

Most of the bit-manipulation macros have been changed to functions instead, and marked inline for efficiency, as all three of the style guides strongly recommended. Unfortunately, this caused a problem itself: Steel Bank Common Lisp assumes that if there are more than 200 inline function expansions in a single function, that you must be trying to inline-expand a recursive function, and stops the inline expansion. Although the code still worked without it, I wasn’t happy with that.

The obvious answer was to reduce the number of functions that were expanded inline. But those functions were all tiny and important, and all of them really needed to be inline for efficiency, so turning them into standard (non-inlined) functions wasn’t an acceptable option. And turning them back into macros, the other way of doing this, would have been a step backward.

The second possible solution was to increase the inline function limit. A simple Google search on the text of the warning message turned up a way to easily do that (as well as another Lisp MD5 implementation that had run into the same problem), but also revealed that it was implementation-specific: CMUCL and SBCL were apparently the only Lisp environments that had this problem, or at least the only ones that let you know about it. Implementation-specific code is, in my humble opinion, ugly. It’s a necessary evil at times, but it’s the kind of thing you want to hide away in files that provide a general interface that works across all implementations, when you can’t avoid it. There was no good way to do that here. As I didn’t want to write ugly code, that was an unpalatable option as well.

That left me stumped, so I played with the code for a bit, and noticed that changing things changed the location that the warning came up at. It originally came up in the macro for the fourth-round code; changing that macro into a function caused it to come up in the first-round macro code instead. A little thought led me to the idea that SBCL must be counting the number of inline expansions on a per-function basis, and changing the rounds-macros into true (non-inline) functions eliminated the warnings completely, proving that this was the case.

It isn’t a perfect solution: it’s slightly less efficient, because as functions, I had to pass the numbers back and forth (whereas when they were macros, each one had full access to the local variables and could change them as needed). But the efficiency difference is slight enough that it’s overwhelmed by my desire to write beautiful code, and maximum speed isn’t necessary here (if it was, I’d have written it in C and used FFI to access it from Lisp), so that’s the path I followed.

That was the only real difficulty. There were a number of other internal changes, mostly style-related, such as giving things more accurate names, adding descriptions, getting rid of the handful of evals that I’d put into one of the macros (it’s bad style, and as it turns out they weren’t needed anyway), and changing my bit-manipulations to be closer to the accepted Lisp way of doing things. A few things (such as the formatting function) were abstracted into more general functions, so that if I ever want to port another hashing algorithm to Lisp, I can re-use them. There were also a few functional changes: it can now accept either a string or an array of unsigned-byte 8s (the earlier code could only accept a string), and all of the functionality is now accessible via the single md5 function with keyword parameters instead of having a separate set of functions for calculating partial hashes.

I also simplified it by getting rid of a few things. The general-purpose bit-manipulation macros are gone, replaced by the aforementioned 32-bit-specific inline functions — if I need 16- or 64-bit functions, I’ll write new ones that are specific to those sizes. And after much internal debate, I got rid of the as-binary option as well… I couldn’t think of a compelling reason to have it there, and it would complicate matters if/when I decide to add other hashing algorithms.

All in all, I think the new code is a lot better. I may continue to play with it though. 🙂

I finish this entry with three Lisp-related quotes. I particularly agree with the second one; the third is simply very geek-amusing:

God wrote in Lisp code When he filled the leaves with green. The fractal flowers and recursive roots: The most lovely hack I’ve seen. — Bob Kanefsky, “Eternal Flame”
No [programming] language feels more natural than Lisp. There’s a real sense that while Python was invented by a brilliant programmer, Lisp is built into of the structure of the Universe. — Unknown
These are your father’s parentheses. Elegant weapons, for a more… civilized age. XKCD