There was no entry yesterday because I was deep in a programming project. Very deep… I was up and at it at 8am, and other than a few very short breaks, kept going until 3am this morning. And that’s on top of about six hours on it the day before.
Around 1995, I started writing what’s known as a “multiple-precision integer arithmetic” package. In plain English, that’s a library of functions that can handle very, very large numbers — on the order of thousands of decimal digits, far more than the number types built into general-purpose computers can handle. There are at least half a dozen others available, but I wanted to write my own (not only to sidestep licensing issues, but also because it’s fun 🙂 ). The first versions were crude, and the interface is still fairly ugly, but I’ve been improving it sporadically ever since.
One of the things I’ve used it for is to build a public-key encryption system for one of my programs. It worked, though not quickly. But now I needed the ability to generate large prime numbers as quickly as possible, and the current version just wasn’t cutting it.
There were a few general speed improvements I could use that hadn’t been available when I wrote the earlier versions. Some of the algorithms I used in it, such as the one for division, were quick hacks that I threw together and just never replaced. Others I just hadn’t had the time to implement yet. And there were some speed-improving ones that I’d heard about but had never mastered.
One of those last is the Montgomery reduction, an algorithm that greatly speeds up certain math operations by eliminating the division steps. It’s an extremely elegant piece of work, but I’d never been able to figure it out because most of the descriptions on the ‘net seem to lack a few important details. For example, every one of them describes the actual “reduction” function in great detail, so I wrote it, thinking it would be an important part of the algorithm — turns out that that function is never needed. And testing the functions required doing a preparatory conversion step before it and a “deconversion” step afterwards to get the real result — but the details of those steps are almost impossible to find, I figured them out by trial and error once I understood what the algorithm was doing. (For the curious: to convert to Montgomery form, multiply the number by your R value, then take the modulus; to convert back, multiply the number by the modular inverse of R and take the modulus.) I learned a lot, and it was fun, but it was also a vigorous workout for my patience.
When I wrote the original version, the largest single-precision integer (the kind built into the computer, and the building block of every multi-precision integer package) was only 32 bits wide. Modern computers can handle much larger integers, up to 128 bits wide in some cases. Using larger integers means needing fewer steps for the same result, improving the speed of all multi-precision operations, so I wanted to take full advantage of them. I’d written the original code so that it could, in theory, handle any such type changes by simply altering a single line. But when I did that, just about everything broke. It took several hours, and more patience, to carefully step through the operations, figure out which ones weren’t working and why, and fix them all.
The last step was increasing the number of trial divisions. As a preliminary step to identifying prime numbers, I’d been trial-dividing candidate numbers by all the prime numbers below 100, which I’d figured out and hard-coded into the program. But according to Bruce Schneier’s Applied Cryptography, it was most efficient to trial-divide by all the prime numbers below 2,000, so I implemented the sieve of Eratosthenes to identify them, caching them so that the sieve only needed to be used once in any particular run.
The result of all that work: locating a particular prime number took 275 seconds (four minutes and thirty-five seconds) with the old code. The new code required only ten seconds! That’s nearly thirty times faster, and well within the time that I needed.
There’s still plenty of room to improve the library, like putting in the proper division algorithm, but it’ll do for now. 🙂