There was no entry yesterday because I was deep in a programming project. Very deep… I was up and at it at 8am, and other than a few very short breaks, kept going until 3am this morning. And that’s on top of about six hours on it the day before.
Around 1995, I started writing what’s known as a “multiple-precision integer arithmetic” package. In plain English, that’s a library of functions that can handle very, very large numbers — on the order of thousands of decimal digits, far more than the number types built into general-purpose computers can handle. There are at least half a dozen others available, but I wanted to write my own (not only to sidestep licensing issues, but also because it’s fun 🙂 ). The first versions were crude, and the interface is still fairly ugly, but I’ve been improving it sporadically ever since.
One of the things I’ve used it for is to build a public-key encryption system for one of my programs. It worked, though not quickly. But now I needed the ability to generate large prime numbers as quickly as possible, and the current version just wasn’t cutting it.
There were a few general speed improvements I could use that hadn’t been available when I wrote the earlier versions. Some of the algorithms I used in it, such as the one for division, were quick hacks that I threw together and just never replaced. Others I just hadn’t had the time to implement yet. And there were some speed-improving ones that I’d heard about but had never mastered.
One of those last is the Montgomery reduction, an algorithm that greatly speeds up certain math operations by eliminating the division steps. It’s an extremely elegant piece of work, but I’d never been able to figure it out because most of the descriptions on the ‘net seem to lack a few important details. For example, every one of them describes the actual “reduction” function in great detail, so I wrote it, thinking it would be an important part of the algorithm — turns out that that function is never needed. And testing the functions required doing a preparatory conversion step before it and a “deconversion” step afterwards to get the real result — but the details of those steps are almost impossible to find, I figured them out by trial and error once I understood what the algorithm was doing. (For the curious: to convert to Montgomery form, multiply the number by your R value, then take the modulus; to convert back, multiply the number by the modular inverse of R and take the modulus.) I learned a lot, and it was fun, but it was also a vigorous workout for my patience.
When I wrote the original version, the largest single-precision integer (the kind built into the computer, and the building block of every multi-precision integer package) was only 32 bits wide. Modern computers can handle much larger integers, up to 128 bits wide in some cases. Using larger integers means needing fewer steps for the same result, improving the speed of all multi-precision operations, so I wanted to take full advantage of them. I’d written the original code so that it could, in theory, handle any such type changes by simply altering a single line. But when I did that, just about everything broke. It took several hours, and more patience, to carefully step through the operations, figure out which ones weren’t working and why, and fix them all.
The last step was increasing the number of trial divisions. As a preliminary step to identifying prime numbers, I’d been trial-dividing candidate numbers by all the prime numbers below 100, which I’d figured out and hard-coded into the program. But according to Bruce Schneier’s Applied Cryptography, it was most efficient to trial-divide by all the prime numbers below 2,000, so I implemented the sieve of Eratosthenes to identify them, caching them so that the sieve only needed to be used once in any particular run.
The result of all that work: locating a particular prime number took 275 seconds (four minutes and thirty-five seconds) with the old code. The new code required only ten seconds! That’s nearly thirty times faster, and well within the time that I needed.
There’s still plenty of room to improve the library, like putting in the proper division algorithm, but it’ll do for now. 🙂
Since I wrote that entry yesterday, I’ve been redesigning, refining, and rewriting the library code. I’ve replaced about 90% of it now, enough that I feel justified in calling it a completely new program. It’s far more streamlined, even faster, and is well on its way to being beautiful as well.
As part of it, I finally finished a proper division algorithm. It was one hell of a pain in the ass… the Handbook of Applied Cryptography had a detailed description of the algorithm, but after spending many hours on it, I still wasn’t able to get it working at all, or figure out where it was going wrong.
I vaguely recalled seeing it in another book I had, and a search turned up Number Theory: A Programmer’s Guide, which also described it, and which had sample code for a very limited version. But after using it as a template to write my own version, it wouldn’t work either — or rather, it would work in some of my tests, but not all of them. It took many more hours of work to figure out the two different places where his algorithm was lacking, and to work out solutions for them.
The result was worth the effort, though. Division operations are now many times faster, I can’t tell by exactly how much. The prime-number generation test has sped up further too, it’s now down to 7.5 seconds, thirty-seven times faster than it was when I started. 😀
I’ve had about five hours of sleep in the last forty-eight, and I wouldn’t have eaten at all if GoddessJ hadn’t been here to remind me to do so. Oddly enough, I’m not really tired even now… this project fascinates me like few others can, I could happily work on it for weeks if I could spare the time. I’m sure I’ll have to pay the piper at some point, but this fascination, almost euphoric, is well worth it.
Neat! For some reason I recall that the “Sieve” isn’t the fastest way to get primes, but I don’t remember well enough to know for sure. Sounds like a lot of fun, I went through a period where I was teaching myself a lot of advanced math….
That sieve is one of the fastest ways to find small (or rather, “relatively” small 🙂 ) primes. It’s just not efficient enough for larger ones.
Well, I seem to have hit the wall. The test is down to 5.5 seconds (fifty times faster than the original!), and nothing I’ve thought of has reduced it by any measurable amount for several days. Oh well, that’ll have to suffice. 🙂