We've been exposed to revelations that (1) RNC emails were not archived at all, and (2) Whitehouse emails between 2001 and 2003 were sort of archived, apparently via an extremely simplistic one-level backup onto magtape where the magtapes were routinely overwritten by subsequent backups.
This stings my geekish heart, and it leads me to make a geekish point, and to pose rhetorically a geekish question for all of the current presidential candidates.
First, the point I wanted to make. There is a difference between an archive and a backup. Yes, both are copies made of current files so they can be accessed later--no question that there is an overlap and considerable similarity of function. However, the two serve very different purposes and therefore should be implemented in very different ways.
A backup is something that should preserve as much as possible of current files so that they can be restored in the event of a catastrophe. For backups, the emphasis should be on ease of creation, automaticity, and thoroughness, but not so much emphasis should be placed on ease of access of a particular piece of information, since catastrophes by definition are relatively rare. Backups can be as simple as just copying the current bits on the disks onto an external medium, with no structure or analysis (most backup systems do a little more than that, of course). For the purposes of backup media, a very common method is to use a small number of sets of backup media (e.g., 2, 4, 7, whatever) and to cycle through them an a daily basis. The fact that there are several sets means that if your system fails during a backup, the previous backup will still be intact. If you used only one set, then a system failure during a backup could cause very large loss of data. Having more than two sets increases reliability, because in general, you would not like to discover that your only backup has a medium error when you tried to read it. However, in all of these cases, you are overwriting an older backup with a newer one.
Now, a sane system administrator who is using the above system would also routinely snag a complete set of backups and put it in a safe place, replacing it with a new set of tapes. This could happen, say, once a month or so. The reason for this is to make a snapshot of what is on the system at an earlier point in time. These snapshots would not be overwritten. In fact, these snapshots are often moved to another location, to protect against destruction of the main facility.
Parenthetically, I would ask the Whitehouse IT staff if there are any of these snapshots laying around somewhere from the critical years from 2001 to 2003--they could be very useful in resolving some of the mystery.
On the other hand, an archive is a permanent record of files from the past, generally organized in a way that promotes ease of access, and generally implemented with a hierarchy and with intelligent choices about what is to be archived and how it is to be organized. For example, emails could be organized by date, sender, recipient, or even by contents. You could skip temporary and intermediate files. You wouldn't include applications, and there could be certain kinds of priviledged or irrelevant information that you would explcitly exclude from the archive. In general, archives require more set-up and organization, and they may not be complete, and they may not even be automatic (although they certainly can be).
Note that such an archive is actually a database just like other databases in use on the computer system, and that a good backup system will routinely backup archives along with everything else.
What the law apparently demands of the Whitehouse is that it keep an archive of its documents, including emails, text messages, and so on. Common sense also demands that all computer systems receive regular backups, and that the backups include the legally mandated archives. However, these are considerations at two very different levels.
The geekish question I would pose to all of the current candidates is how they will archive their administration's documents and communications. I've never heard this asked or answered, and I think it could be very illuminating to hear the response.
Greg Shenaut