Future-proof your files
Make sure you don’t lose access to vital docs
Part of the wonder of ancient documents such as the Dead Sea Scrolls is that they’ve survived for so many centuries. Today’s digital files, by contrast, are in danger of being lost to us within a matter of decades – or even years. It seems counter-intuitive, doesn’t it? How can we talk of a fragile, physical object such as a papyrus scroll being robust, while modern virtual documents, which can be endlessly duplicated, are threatened with extinction?
The problem is twofold. The first and more obvious issue is the media we store these files on might become physically unreadable. But the second problem is even more pernicious; the software that created those files might no longer exist or (more likely) run on later hardware. So even if you’ve managed to save your files on a device or medium you can still access in 20, 30, 50 years’ time, you might not be able to open them. Things move so fast in technology, and show no signs of slowing down. Of course, this is potentially a more significant problem than just not being able to open a letter to your bank manager a decade after you wrote it. Indeed, Jerome Mc Donough of the University of Illinois has warned that society as a whole might lose huge swathes of its culture in a ‘digital dark age’, since we’re generating so much of it in digital formats that future generations might not be able to access.
But we’re here to help. We’re going to explain the whole topic of digital preservation, giving you the information you need to help futureproof your files. We’ll also share the general principles that mean you can take sensible, pragmatic decisions even when the specifics of how to deal with particular files and formats change. We’re going to focus on making sure you can still access your files in your own lifetime – so, decades rather than centuries – but along the way you’ll see how the knowledge and techniques can in fact future-proof your files for much longer.
Don’t trust the media
Let’s start then with media, and with the most important lesson you can learn here: don’t trust it. That’s not to say that different media – hard disks, USB sticks, CD-Rs and the like – are always going to fail in a few decades’ time, but rather it’s about making sure you don’t get complacent. ‘Not to worry,’ you might think; ‘Those files are burned onto a DVD so they’re safe’. There’s a lot wrong with that statement, though.
First, media can degrade – and actually, optical discs are especially bad for this. Pre-recorded CDs and DVDs that you buy from shops are usually more robust, since the data is encoded in physical pits and flats – you can think of it a bit like seeing Morse code carved into a surface – but with discs you burn yourself, those pits and flats are written by blasting a light-sensitive layer to trigger some bits to turn a different colour. Over time – and especially if stored badly (such as in direct sun) – the pattern of light and dark areas can degrade (although we reviewed a new system called M-DISC in MF265, which physically burns away the pits, and claims to produce discs that last for up to a thousand years). What’s more, the physical disc itself can get damaged, such as with surface scratches or warping through temperature fluctuations.
You might know this already, but fewer people realise this kind of ‘bit rot’ can affect all sorts of
Don’t just write an archive to a DVD and put it away; you still need to check the data on it is readable
media – just in different ways. Hard disks might seem secure, for example – we use them all the time, after all – but actually it’s when we don’t use them that things can go wrong; the magnetically encoded data can degrade if left on a shelf, and there’s always the possibility the physical components – spindles, read/write heads and the like – will fail.
So when we say not to trust media, we just mean: check up on it once in a while. Don’t just write an archive to a DVD and stick it on a shelf somewhere; instead, occasionally check the data on it is still accessible. And, as well as checking up on your media, you should also be thinking about migrating it to new formats. Apple has essentially abandoned optical drives on its Macs now, so how long before it becomes very hard to find an optical drive to read files carefully archived onto CDs and DVDs? Even if you can find an external optical drive in a few decades’ time, what are the chances it will have an interface that can connect to the computer you’re using then, or have drivers available to let it mount the media? So, periodically, take stock: how are your files stored, and does it look likely that they’re locked away in a format that doesn’t have a long-term future?
If you think that’s the case, it’s time to migrate it to a new, more future-proof format; copy old CDs or Zip disks (remember those?) onto a more modern medium such as a hard disk while you still can. Remember not to fall back into old, complacent ways. Just because you’ve copied the CDs onto a hard disk doesn’t mean they’re safe; continue to verify the data on the hard disk to ensure modern systems can access its file system, and keep an eye on the technology market to make sure, for example, the connection the hard disk uses isn’t being phased out, which would stop you (easily) accessing the data on it.
Talking of hard disks, incidentally, one thing you can do to help preserve information archived on inactive hard disks is to ‘refresh’ it – make sure you read every bit so that the degradation of the magnetic information is drastically slowed. Sure, the idea of opening every file on a hard disk sounds tedious, but happily your computer can do it for you automatically with a little line of Terminal code. There’s a good explanation of the code and its effect at larryjordan.biz/technique-refreshinghard-disk-storage/.
Store it up
You need to think about where you store your media, as well. Most media fares best in cool, dark, dry, stable conditions, but if you’re specifically storing archives, you need to guard against the possibility you could lose them through theft or a disaster, such as a fire. Be aware, though, that even fire- and waterproof safes are only rated to resist their effects for a given amount of time; just because you spend,
Hard copies might sound old-fashioned, but printing out docs and getting photos printed is wise
say, 50 quid on a fireproof safe it doesn’t guarantee you’ll still be able to read data stored on media inside it after a fire. If it’s rated to preserve the contents for half an hour but the fire rages for longer, you’ll be in for shock when you crack open the safe.
A lot of the lessons we’ve learned about best practice for backup are relevant here too: save in multiple places (even if that’s just a copy with you and one with a friend or family member). And for preference, mix media, too. Mirroring a copy onto different devices or types of media gives you a better chance of accessing files in future, since even if one format has become inaccessible you’ve got another option.
What about online archiving? It’s worth considering, but unless it’s through a service you have some control over yourself, be wary. It might seem like a bulletproof idea to store your files using Dropbox, BT Cloud or the like, but if these services close down, you might lose access to them. Check the terms and conditions to see how easy and fast it is to get your files out again if the service is threatened.
Finally, don’t forget about hard copies. It may sound old-fashioned, but by printing out important documents and emails, and sending digital photos to services such as Photobox (which exposes them like traditional photos, so they last longer than printing on inkjets), you’ve given yourself a better chance of viewing them for decades to come – not unlike the Dead Sea Scrolls! Sure, paper is susceptible to fire and degrading, but it’s still worth doing for things that really matter; they’re then in a format that will remain accessible no matter how technology changes, and you can help prevent degradation by using acid-free paper and storing the prints somewhere dry, dark and in a container that’s resistant to water and fire.
Files of the future
File formats are actually incredibly complex things, and usually companies – for sound commercial reasons – closely guard the specifications for those files. This makes it by no means certain you’ll be able to open the files you have on your hard disk in even a few years’ time.
There are, then, two main reasons why we might call a file format ‘future-proof’.
The first is the technical reason. Those file formats whose specifications – the way in which they’re built – are officially only known to the company that makes the software that reads and writes to them are, at least in theory, the most ‘dangerous’. That’s because if the company closes down or decides not to release the spec for whatever reason, the information others need to properly open and understand that format might not be available. Sure, it’s possible, with sufficient time, resources and smart people, that you can ‘reverse-engineer’ a format, but without a published spec for the format, it would just be a case of guessing – and you might guess wrong. What’s more, if you’re using a niche format, nobody might even care about putting in the time to reverse-engineer it.
Ideally, what you want are file formats whose specification is a published, open standard. That way, even if a format’s biggest proponents or the body that governs it goes under, anyone will be able to read the spec and understand how the file is structured. So they could, in theory, build a new app for future platforms to open that file. There’s a list of these open formats on Wikipedia at en.wikipedia.org/wiki/Open_format.
But that doesn’t tell the whole story. Some file formats will be robust enough to be opened and possibly even edited decades into the future. That could be down to commercial pressure – the format is widely adopted in business, giving the inventor incentive to keep developing software to use it even as platforms change. On the other hand, it could be because it’s a format the world just uses. JPEG, for example, technically isn’t an open format (although JPEG 2000 is), but we’d still recommend it as a pretty robust format for the future because vast sections of today’s culture is encoded as JPEG.
Closed, proprietary formats actually have the potential to be more robust than open formats. They may not be openly documented, but if they represent a big and important-enough slice of culture and commerce, there will be commercial pressures to keep the format alive. In contrast, only a small group of die-hards might care about some open formats. No matter how welldocumented their spec is, if nobody’s actively developing apps to access open-format files on new platforms in the future, files in that format will be as inaccessible as any saved with a proprietary format.
So, how best to proceed?
This all might sound terribly complex, but the practical take-away from it is actually very simple: hedge your bets. Here’s an example: it’s likely you
use Pages as your word processor. However, Pages’ file format isn’t open, so it’s possible that, like Claris-Works documents before it, this might become inaccessible to computers of the future. We’re not going to suggest you should definitely abandon Pages in favour of the Open-Document-native Libre-Office (see more in our tutorial on page 46), for example, because you might just prefer Pages and that’s fine. But you should consider saving files in alternative, more futureproof formats as well. If it’s impractical for you to do this for all your files (although it could be scripted), you should consider exporting the most important files into, for example, plain text if it’s mostly just a straight word processing document, or PDF for more complex design layouts. And, as before, don’t forget about the future-proofing inherent in just printing your documents and pictures out, and filing them somewhere safe; sure, you can’t then open them up and edit them as you might want to do, but the information at least is all preserved. You can always retype or run documents through an optical character recognition (OCR) program, or scan in photos in the future.
Nobody can predict with absolute certainty how technology of the future is going to develop, but we do know we’re going to be frustrated if we can’t open, edit and share files that we created in the past. By being sensible, though – pragmatically using or exporting to file formats that we believe will remain usable for at least decades, and carefully considering how accessible and stable different media could be in the future – you can give yourself the best chance of not losing everything, be it crucial work documents or irreplaceable photos of friends and family.
It’s mostly about putting in a little bit of effort now to save yourself from expending vast amounts of effort – or money – in the future. The Dead Sea Scrolls, for example, couldn’t be easily ‘opened and read’. They had degraded and had to be painstakingly reconstructed, and that effort was only worth it because they’re such an important artefact. Similarly, this basic idea might be applied 50 years from now to files locked away in proprietary formats stored on warped and degraded DVD-Rs. But most of us aren’t producing files important enough to have that kind of effort expended on them, so take control of your files’ future today!