Secure Remote Backups, Part II

As mentioned a couple weeks ago, I recently picked up a new piece of hardware, a 1TB MyBook World Edition II. It’s a network-attached external hard drive that turned out to be a full-fledged headless Linux box. I’ve been wanting to change my backup setup to go to it instead of the CygWin ssh server on my office desktop machine. That wouldn’t be too difficult since I’ve already got it running an ssh server, except for one thing: I need to ensure that the backups are properly encrypted, so that if someone breaks into the office and makes off with the hardware, they won’t also have access to all my personal and business files as well.

Backup software under Linux doesn’t have a way to encrypt backups. That’s part and parcel of the Linux philosophy: every program does one thing and does it well, and relies on other programs to do other things. I’ve gotten around it on my office desktop machine by creating a TrueCrypt-encrypted backup drive, but while the MyBook may also be running Linux, it’s not on x86-compatible hardware, so getting TrueCrypt to compile and run on it is a headache I didn’t want to deal with. Also, any time the office machine is rebooted, someone has to be at the console to enter the TrueCrypt password for the backup drive… not the best solution, but the only secure one I could think of when I set it up. That isn’t an option for the MyBook, because there’s no keyboard/monitor connected to it (and none can be).

The solution that I finally came up with was to create an encrypted loop-mounted device in a file on a MyBook Samba share, storing the passwords to it in a shell script on the laptop that mounts the drive just before each backup operation, and unmounts it afterwards. The laptop’s files have more defenses on them than Fort Knox, so it’s safe enough to store the passwords there. And of course, I’ve got the passwords well-memorized also… that’ll be rather important when I need to restore the files after a crash on the laptop. 🙂

The shell script took a while to create. This was mostly because I was only somewhat familiar with the bash shell, but it didn’t help that one of the web pages that I’d found to guide me in that, while otherwise excellent, used the -e option of the losetup program to do the encryption, which is apparently deprecated and no longer available in Ubuntu Linux. I substituted the cryptsetup program for that, relying on my earlier experiences with the device manager and dm-crypt, so after a few hours of research on the ‘net and in the help files, I managed it.

For the actual backup, I decided to use rsync instead of the Simple Backup program I’ve been using up until now. The bad part of this is that the backed-up data isn’t compressed that way, but with half a terabyte of storage space (only half because the MyBook device is currently configured as mirrored RAID drives, for maximum security), that’s a very minor consideration. The good parts are the obvious advantages to having the files easily browsable (and restorable) via a simple copy, and the ability to use hard-links to keep full backups every time, with almost no time or storage space costs. 🙂 The page I used as my primary guide for that part is here.

The last problem is that, even using hard-links, a backup drive will eventually fill up if you keep every backup you’ve ever made. You have to delete old backups at some point. Although I’m a big fan of the Tower of Hanoi backup schedule for removable media, in this kind of backup system, it makes more sense (and is a lot simpler) to simply name the directories by the date and time the backup was started, then keep only a certain subset of them. The “logarithmic” option in Simple Backup sounds pretty good: keep all backups from yesterday, one backup per day from the last week, one backup per week from the last month, one backup per month from the last year, and one backup per year for anything older; everything else is erased. I haven’t implemented this part yet, but it shouldn’t be difficult to do.

That’s all there is to it!

Well… not quite.

Something was still wrong. Four times out of five, the initial backup would get quite a way into the process, and then some glitch caused the connection to be lost, which also damaged the loop-mounted drive file (and often damaging or losing existing information on it). I tried a USB network adapter instead of the internal one on this laptop, to no effect, so it didn’t seem to be the hardware on the computer. I had no problem with the MyBook drive otherwise, so it didn’t seem to be the network hardware or the drive itself either. That only left software, and after much consideration, I figured the problem had to be in Samba.

Fortunately, there’s an alternative to using Samba in this case. These two pages gave me enough information to set up the MyBook drive as an NFS server, so I got that working (after a great deal of trial and error), modified my script to use NFS instead of Samba, and gave it another try.

It worked! NFS seemed noticeably faster than Samba too (though that’s just an impression, I couldn’t accurately time the Samba runs). So now I have a pretty darned good backup system that goes to the new MyBook network-attached storage system. 🙂

There are still potential problems, of course. Backups should always be read-only, to prevent a virus or user-error from damaging them. Data on hard drives is subject to bit-rot if it’s not re-written occasionally, and current file systems not only can’t fix it, they can’t even detect it. If the network goes down while a backup is being written, the backup volume will be damaged, maybe irretrievably. And of course, if the hardware is stolen or destroyed by a disaster at the office, all the data is lost.

There’s no easy solution to that last part, only off-site backups will suffice for it. But I believe there is a solution to the first three, and that would also compress the files, which I’m exploring for a part III of this series.