Future-proof your files

Make sure you don't lose access to vital docs

Part of the won­der of an­cient documents such as the Dead Sea Scrolls is that they’ve sur­vived for so many cen­turies. To­day’s dig­i­tal files, by con­trast, are in dan­ger of be­ing lost to us within a mat­ter of decades – or even years. It seems counter-in­tu­itive, doesn’t it? How can we talk of a frag­ile, phys­i­cal ob­ject such as a pa­pyrus scroll be­ing ro­bust, while mod­ern vir­tual documents, which can be end­lessly du­pli­cated, are threat­ened with extinction?

The prob­lem is twofold. The first and more ob­vi­ous is­sue is the me­dia we store these files on might be­come phys­i­cally un­read­able. But the sec­ond prob­lem is even more per­ni­cious; the soft­ware that cre­ated those files might no longer ex­ist or (more likely) run on later hard­ware. So even if you’ve man­aged to save your files on a de­vice or medium you can still ac­cess in 20, 30, 50 years’ time, you might not be able to open them. Things move so fast in tech­nol­ogy, and show no signs of slow­ing down. Of course, this is po­ten­tially a more sig­nif­i­cant prob­lem than just not be­ing able to open a let­ter to your bank man­ager a decade af­ter you wrote it. In­deed, Jerome Mc Donough of the Univer­sity of Illi­nois has warned that so­ci­ety as a whole might lose huge swathes of its cul­ture in a ‘dig­i­tal dark age’, since we’re gen­er­at­ing so much of it in dig­i­tal for­mats that fu­ture gen­er­a­tions might not be able to ac­cess.

But we’re here to help. We’re go­ing to ex­plain the whole topic of dig­i­tal preser­va­tion, giv­ing you the in­for­ma­tion you need to help fu­ture­proof your files. We’ll also share the gen­eral prin­ci­ples that mean you can take sen­si­ble, prag­matic de­ci­sions even when the specifics of how to deal with par­tic­u­lar files and for­mats change. We’re go­ing to fo­cus on mak­ing sure you can still ac­cess your files in your own life­time – so, decades rather than cen­turies – but along the way you’ll see how the knowl­edge and tech­niques can in fact fu­ture-proof your files for much longer.

Don’t trust the me­dia

Let’s start then with me­dia, and with the most im­por­tant les­son you can learn here: don’t trust it. That’s not to say that dif­fer­ent me­dia – hard disks, USB sticks, CD-Rs and the like – are al­ways go­ing to fail in a few decades’ time, but rather it’s about mak­ing sure you don’t get com­pla­cent. ‘Not to worry,’ you might think; ‘Those files are burned onto a DVD so they’re safe’. There’s a lot wrong with that state­ment, though.

First, me­dia can de­grade – and ac­tu­ally, op­ti­cal discs are es­pe­cially bad for this. Pre-recorded CDs and DVDs that you buy from shops are usu­ally more ro­bust, since the data is en­coded in phys­i­cal pits and flats – you can think of it a bit like see­ing Morse code carved into a sur­face – but with discs you burn yourself, those pits and flats are writ­ten by blast­ing a light-sen­si­tive layer to trig­ger some bits to turn a dif­fer­ent colour. Over time – and es­pe­cially if stored badly (such as in di­rect sun) – the pat­tern of light and dark ar­eas can de­grade (al­though we re­viewed a new sys­tem called M-DISC in MF265, which phys­i­cally burns away the pits, and claims to pro­duce discs that last for up to a thou­sand years). What’s more, the phys­i­cal disc it­self can get dam­aged, such as with sur­face scratches or warp­ing through tem­per­a­ture fluc­tu­a­tions.

You might know this al­ready, but fewer people re­alise this kind of ‘bit rot’ can af­fect all sorts of

Don’t just write an ar­chive to a DVD and put it away; you still need to check the data on it is read­able

me­dia – just in dif­fer­ent ways. Hard disks might seem se­cure, for ex­am­ple – we use them all the time, af­ter all – but ac­tu­ally it’s when we don’t use them that things can go wrong; the mag­net­i­cally en­coded data can de­grade if left on a shelf, and there’s al­ways the pos­si­bil­ity the phys­i­cal com­po­nents – spin­dles, read/write heads and the like – will fail.

So when we say not to trust me­dia, we just mean: check up on it once in a while. Don’t just write an ar­chive to a DVD and stick it on a shelf some­where; in­stead, oc­ca­sion­ally check the data on it is still ac­ces­si­ble. And, as well as check­ing up on your me­dia, you should also be think­ing about mi­grat­ing it to new for­mats. Ap­ple has es­sen­tially aban­doned op­ti­cal drives on its Macs now, so how long be­fore it be­comes very hard to find an op­ti­cal drive to read files care­fully archived onto CDs and DVDs? Even if you can find an ex­ter­nal op­ti­cal drive in a few decades’ time, what are the chances it will have an in­ter­face that can con­nect to the com­puter you’re us­ing then, or have driv­ers avail­able to let it mount the me­dia? So, pe­ri­od­i­cally, take stock: how are your files stored, and does it look likely that they’re locked away in a for­mat that doesn’t have a long-term fu­ture?

If you think that’s the case, it’s time to mi­grate it to a new, more fu­ture-proof for­mat; copy old CDs or Zip disks (re­mem­ber those?) onto a more mod­ern medium such as a hard disk while you still can. Re­mem­ber not to fall back into old, com­pla­cent ways. Just be­cause you’ve copied the CDs onto a hard disk doesn’t mean they’re safe; con­tinue to ver­ify the data on the hard disk to en­sure mod­ern sys­tems can ac­cess its file sys­tem, and keep an eye on the tech­nol­ogy mar­ket to make sure, for ex­am­ple, the con­nec­tion the hard disk uses isn’t be­ing phased out, which would stop you (eas­ily) ac­cess­ing the data on it.

Talk­ing of hard disks, in­ci­den­tally, one thing you can do to help pre­serve in­for­ma­tion archived on in­ac­tive hard disks is to ‘re­fresh’ it – make sure you read ev­ery bit so that the degra­da­tion of the mag­netic in­for­ma­tion is dras­ti­cally slowed. Sure, the idea of open­ing ev­ery file on a hard disk sounds te­dious, but hap­pily your com­puter can do it for you au­to­mat­i­cally with a lit­tle line of Ter­mi­nal code. There’s a good ex­pla­na­tion of the code and its ef­fect at lar­ryjor­dan.biz/tech­nique-re­fresh­ing­hard-disk-stor­age/.

Store it up

You need to think about where you store your me­dia, as well. Most me­dia fares best in cool, dark, dry, sta­ble con­di­tions, but if you’re specif­i­cally stor­ing ar­chives, you need to guard against the pos­si­bil­ity you could lose them through theft or a dis­as­ter, such as a fire. Be aware, though, that even fire- and wa­ter­proof safes are only rated to re­sist their ef­fects for a given amount of time; just be­cause you spend,

Hard copies might sound old-fash­ioned, but print­ing out docs and get­ting pho­tos printed is wise

say, 50 quid on a fire­proof safe it doesn’t guar­an­tee you’ll still be able to read data stored on me­dia in­side it af­ter a fire. If it’s rated to pre­serve the con­tents for half an hour but the fire rages for longer, you’ll be in for shock when you crack open the safe.

A lot of the lessons we’ve learned about best prac­tice for backup are rel­e­vant here too: save in mul­ti­ple places (even if that’s just a copy with you and one with a friend or fam­ily mem­ber). And for pref­er­ence, mix me­dia, too. Mir­ror­ing a copy onto dif­fer­ent de­vices or types of me­dia gives you a bet­ter chance of ac­cess­ing files in fu­ture, since even if one for­mat has be­come in­ac­ces­si­ble you’ve got an­other op­tion.

What about on­line ar­chiv­ing? It’s worth con­sid­er­ing, but un­less it’s through a ser­vice you have some con­trol over yourself, be wary. It might seem like a bul­let­proof idea to store your files us­ing Drop­box, BT Cloud or the like, but if these ser­vices close down, you might lose ac­cess to them. Check the terms and con­di­tions to see how easy and fast it is to get your files out again if the ser­vice is threat­ened.

Fi­nally, don’t for­get about hard copies. It may sound old-fash­ioned, but by print­ing out im­por­tant documents and emails, and send­ing dig­i­tal pho­tos to ser­vices such as Pho­to­box (which ex­poses them like tra­di­tional pho­tos, so they last longer than print­ing on inkjets), you’ve given yourself a bet­ter chance of view­ing them for decades to come – not un­like the Dead Sea Scrolls! Sure, paper is sus­cep­ti­ble to fire and de­grad­ing, but it’s still worth do­ing for things that re­ally mat­ter; they’re then in a for­mat that will re­main ac­ces­si­ble no mat­ter how tech­nol­ogy changes, and you can help pre­vent degra­da­tion by us­ing acid-free paper and stor­ing the prints some­where dry, dark and in a container that’s re­sis­tant to wa­ter and fire.

Files of the fu­ture

File for­mats are ac­tu­ally in­cred­i­bly com­plex things, and usu­ally com­pa­nies – for sound commercial rea­sons – closely guard the spec­i­fi­ca­tions for those files. This makes it by no means cer­tain you’ll be able to open the files you have on your hard disk in even a few years’ time.

There are, then, two main rea­sons why we might call a file for­mat ‘fu­ture-proof’.

The first is the tech­ni­cal rea­son. Those file for­mats whose spec­i­fi­ca­tions – the way in which they’re built – are of­fi­cially only known to the com­pany that makes the soft­ware that reads and writes to them are, at least in the­ory, the most ‘dan­ger­ous’. That’s be­cause if the com­pany closes down or de­cides not to re­lease the spec for what­ever rea­son, the in­for­ma­tion oth­ers need to prop­erly open and un­der­stand that for­mat might not be avail­able. Sure, it’s pos­si­ble, with suf­fi­cient time, re­sources and smart people, that you can ‘re­verse-en­gi­neer’ a for­mat, but with­out a pub­lished spec for the for­mat, it would just be a case of guess­ing – and you might guess wrong. What’s more, if you’re us­ing a niche for­mat, no­body might even care about putting in the time to re­verse-en­gi­neer it.

Ideally, what you want are file for­mats whose spec­i­fi­ca­tion is a pub­lished, open stan­dard. That way, even if a for­mat’s big­gest pro­po­nents or the body that gov­erns it goes un­der, any­one will be able to read the spec and un­der­stand how the file is struc­tured. So they could, in the­ory, build a new app for fu­ture plat­forms to open that file. There’s a list of these open for­mats on Wikipedia at en.wikipedia.org/wiki/Open_­for­mat.

But that doesn’t tell the whole story. Some file for­mats will be ro­bust enough to be opened and pos­si­bly even edited decades into the fu­ture. That could be down to commercial pres­sure – the for­mat is widely adopted in busi­ness, giv­ing the in­ven­tor in­cen­tive to keep de­vel­op­ing soft­ware to use it even as plat­forms change. On the other hand, it could be be­cause it’s a for­mat the world just uses. JPEG, for ex­am­ple, tech­ni­cally isn’t an open for­mat (al­though JPEG 2000 is), but we’d still rec­om­mend it as a pretty ro­bust for­mat for the fu­ture be­cause vast sec­tions of to­day’s cul­ture is en­coded as JPEG.

Closed, pro­pri­etary for­mats ac­tu­ally have the po­ten­tial to be more ro­bust than open for­mats. They may not be openly doc­u­mented, but if they rep­re­sent a big and im­por­tant-enough slice of cul­ture and com­merce, there will be commercial pres­sures to keep the for­mat alive. In con­trast, only a small group of die-hards might care about some open for­mats. No mat­ter how well­doc­u­mented their spec is, if no­body’s ac­tively de­vel­op­ing apps to ac­cess open-for­mat files on new plat­forms in the fu­ture, files in that for­mat will be as in­ac­ces­si­ble as any saved with a pro­pri­etary for­mat.

So, how best to pro­ceed?

This all might sound ter­ri­bly com­plex, but the prac­ti­cal take-away from it is ac­tu­ally very sim­ple: hedge your bets. Here’s an ex­am­ple: it’s likely you

use Pages as your word pro­ces­sor. How­ever, Pages’ file for­mat isn’t open, so it’s pos­si­ble that, like Claris-Works documents be­fore it, this might be­come in­ac­ces­si­ble to com­put­ers of the fu­ture. We’re not go­ing to sug­gest you should def­i­nitely aban­don Pages in favour of the Open-Doc­u­ment-na­tive Li­bre-Of­fice (see more in our tu­to­rial on page 46), for ex­am­ple, be­cause you might just pre­fer Pages and that’s fine. But you should con­sider sav­ing files in al­ter­na­tive, more fu­ture­proof for­mats as well. If it’s im­prac­ti­cal for you to do this for all your files (al­though it could be scripted), you should con­sider ex­port­ing the most im­por­tant files into, for ex­am­ple, plain text if it’s mostly just a straight word pro­cess­ing doc­u­ment, or PDF for more com­plex de­sign lay­outs. And, as be­fore, don’t for­get about the fu­ture-proof­ing in­her­ent in just print­ing your documents and pic­tures out, and fil­ing them some­where safe; sure, you can’t then open them up and edit them as you might want to do, but the in­for­ma­tion at least is all pre­served. You can al­ways re­type or run documents through an op­ti­cal char­ac­ter recog­ni­tion (OCR) pro­gram, or scan in pho­tos in the fu­ture.

No­body can pre­dict with ab­so­lute cer­tainty how tech­nol­ogy of the fu­ture is go­ing to de­velop, but we do know we’re go­ing to be frus­trated if we can’t open, edit and share files that we cre­ated in the past. By be­ing sen­si­ble, though – prag­mat­i­cally us­ing or ex­port­ing to file for­mats that we be­lieve will re­main us­able for at least decades, and care­fully con­sid­er­ing how ac­ces­si­ble and sta­ble dif­fer­ent me­dia could be in the fu­ture – you can give yourself the best chance of not los­ing ev­ery­thing, be it cru­cial work documents or ir­re­place­able pho­tos of friends and fam­ily.

It’s mostly about putting in a lit­tle bit of ef­fort now to save yourself from ex­pend­ing vast amounts of ef­fort – or money – in the fu­ture. The Dead Sea Scrolls, for ex­am­ple, couldn’t be eas­ily ‘opened and read’. They had de­graded and had to be painstak­ingly re­con­structed, and that ef­fort was only worth it be­cause they’re such an im­por­tant arte­fact. Sim­i­larly, this ba­sic idea might be ap­plied 50 years from now to files locked away in pro­pri­etary for­mats stored on warped and de­graded DVD-Rs. But most of us aren’t pro­duc­ing files im­por­tant enough to have that kind of ef­fort ex­pended on them, so take con­trol of your files’ fu­ture to­day!

CDs and DVDs you write yourself can quickly de­grade – al­though this M-DISC sys­tem prom­ises to last longer – up to 1,000 years, in fact.

It’s a good idea to save a copy of files to a more fu­ture-proof for­mat, un­less you want to suf­fer the above alert box.

Don’t let yourself lose ac­cess to trea­sured mem­o­ries and im­por­tant documents; con­sider ex­port­ing im­por­tant files to plain text.

