As mentioned previously, my company sold the rights to one of our previous Windows software products to a much larger company a few years ago. The much larger company keeps my company on retainer to advise them on the product’s continued development.
The product (let’s call it Project Badger) was originally written more than ten years ago now, with me as the sole original developer. At the time, I treated C++ as simply an improved version of C. Microsoft Visual C 6.0, my compiler of (reluctant) choice at the time, barely supported the newly-standardized C++ Standard Template Library (STL); the Boost library didn’t exist yet; and I hated the non-standard and non-portable Microsoft Foundation Classes with a passion. That didn’t leave much choice: I had to hand-code all the algorithms and data structures myself, and use the painfully primitive C string functions and the raw Win32 API.
(Yes, and walk barefoot in the snow to get to school everyday. Uphill. Both ways. 😉 But that’s another story.)
Project Badger has grown a great deal since then, but it was still using the hand-coded data structures and algorithms that I originally came up with, and the C string functions and raw Win32 API. Although they still worked, it was a very creaky and increasingly ugly infrastructure… grafting support for Unicode onto it (as our customers demanded) was a nightmare that required a lot of hand-crafted code, and it was only supported in certain places. And we couldn’t move to Unicode completely, because we still had to support Win9x (Windows 95, 98, and Millennium), for reasons I won’t go into, and they didn’t support Unicode. (Using the UNICOWS DLL, which is Microsoft’s way of retrofitting Unicode support into Win9x systems, was deemed unacceptable in this case, again for reasons I won’t go into.)
This all came to a head a few weeks ago, when a customer reported a problem that should have been easy to fix, but that our bastardized infrastructure made nearly impossible. For a few of its abilities, Project Badger has to open a copy of its EXE file and read some data from a table tacked onto the end; this customer reported that these features wouldn’t work if the EXE file were placed in a path with Unicode characters. The reason was easy to track down: we were using the GetModuleHandleA function to find the filename of the EXE, and CreateFileA to open it. (The ‘A’ on the end means that they’re the ASCII versions of those functions, rather than the Unicode versions, which would be denoted by a ‘W’ there.) But fixing it, in a way that wouldn’t break Win9x compatibility… that was more interesting.
After one of our developers spent a frustrating couple days trying to work around the problem using the “short filenames” (legacy of DOS and the FAT12/FAT16 file system) — to no avail — I proposed overhauling the entire program, replacing many hand-coded parts of the program with components from the STL and Boost libraries. Since I was the only one on the team that had much experience with both, I volunteered to do the initial conversion.
It was a large undertaking, even larger than I’d anticipated. It took me ten straight days, working twelve to sixteen hours a day, to finish it (I’d estimated seven days, for a broader scope of changes). But I think the result was worth the effort.
A key component to the overhaul was a way to support Unicode strings and functions similar to the way that the UNICOWS library does: on platforms where it’s available, dynamically load the Unicode version of the API function and pass the Unicode parameters directly to it. On Win9x platforms (where the Unicode functions aren’t available), fall back on the ASCII versions of the functions, translate all strings to ASCII before passing them in, and translate any results to Unicode when passing them back. To support this, I put together a specialized String class, using the Boost::Variant library. Here’s a simplified version of its declaration (the full version has conversion support for a few program-specific types as well):
namespace os {
class String {
public:
String(const std::string& init): mData(init) { };
String(const std::wstring& init): mData(init) { };
String(const char* init);
String(const wchar_t* init);
std::string to_string() const;
std::wstring to_wstring() const;
const char* to_ptr() const;
const wchar_t* to_wptr() const;
bool isNull() const;
bool isNativelyUnicode() const;
bool isValidString() const;
static std::string _toAscii(const std::wstring& str);
static std::wstring _toUnicode(const std::string& str);
private:
typedef boost::variant<const void*, std::string, std::wstring> Data;
Data mData;
}
}
Note that the boost::variant typedef Data can accept a std::string (a standard ASCII string), a std::wstring (a Unicode string), or a raw ASCII or Unicode string pointer. The pointer option was necessary because some Win32 API functions allow you to pass in an invalid string pointer, encoding (usually via the MAKEINTRESOURCE macro) some specialized information instead of a standard string.
With that in place, and using a set of simple template classes I wrote to handle the dynamic loading, I could write functions that would take either ASCII or Unicode strings and pointers (or even a combination of different types) and do any necessary conversion on the fly:
namespace os {
HANDLE CreateFile(const String& filename, DWORD acc,
DWORD share, LPSECURITY_ATTRIBUTES sec,
DWORD create, DWORD flags, HANDLE tpl)
{
static StdFn7<HANDLE, LPCWSTR, DWORD, DWORD,
LPSECURITY_ATTRIBUTES, DWORD, DWORD, HANDLE>
fn("CreateFileW", cKernel32);
if (fn && filename.isUnicode()) {
return fn(filename.to_wptr(), acc, share, sec, create,
flags, tpl);
} else {
return ::CreateFileA(filename.to_ptr(), acc, share, sec,
create, flags, tpl);
}
}
}
It’s a good solution, though not a perfect one. For one example, it’s not as fast as the raw calls. That doesn’t matter in our case, because most such calls are in user-interface code (where even the slowest machine is fast enough that much more inefficient code than that wouldn’t be noticeable), and the rest are in one-time operations where the speed isn’t critical.
Another limitation is that you can’t pass in a zero or NULL for one of the os::String parameters, even when it might be supported (as in the “title” parameter to the MessageBox function, which defaults to the localized “error” string if passed a NULL). I got around that by defining
const os::String NULLSTRING(static_cast<char*>(NULL));
in the source file, and then putting
extern const os::String NULLSTRING;
in the header. Then, whenever I needed to pass in a NULL, I just changed it to NULLSTRING, and everything worked as normal. I could probably have also defined an os::String constructor that accepted an int
, but that seemed like overkill, and would have worked against type-safety.
The end result: minimal changes to the existing code, while adding maximum flexibility. I’m very happy with it. 🙂