Multithreaded Programs on Multiprocessor Systems

My company is still on contract to help out with a program we developed, which another company bought a few years ago. I’ve spent part of the last two weeks working with that company’s lead programmer, trying to track down a minor but particularly pernicious intermittent bug involving TCP/IP, multithreading, and what seemed to be a timing-related deadlock. What’s worse, this program (because of it’s nature) can’t be debugged in a standard debugger — we’re reduced to using printf statements and imagination — and the problem disappears when a particular debugging printf statement is enabled.

We still hadn’t managed it when he left on Friday, so I tackled it again Sunday morning, and finally discovered the source of the problem: a deadlock between two threads, both of which had locked part of a resource and were trying to lock the other part. It didn’t happen when the debugging statement was there because that statement delayed that thread just enough to prevent it.

This kind of problem never appeared in single-threaded programs, and rarely appeared even in multithreaded ones before we had multiprocessor systems.

Software development tools, and software developers, have gotten a lot more sophisticated in the time that I’ve been writing code. But in some areas, the hardware technology is still way ahead of our ability to use it well.