PC Pro

A short, chilly stroll from the Global Seed Vault in Svalbard is a very different kind of archive…

Beneath the Arctic permafrost lies a unique effort to preserve open-source software for a thousand years,

- writes Dave Stevenson

Longyearby­en, located on the west coast of the Norwegian island of Spitsberge­n in Svalbard, is cold. It always is: the average temperatur­e there is -5˚C. In fact, the average is -13˚C for three months from January to March, climbing to a tropical 7˚C in July. It has a population of under 3,000 people, a single-screen cinema that plays films on Wednesdays and Sundays, and a school whose 270 students show up for their first day in August in mittens and hats. When school trips head into the mountains, teachers carry rifles to account for the risk of polar bears.

It’s also a place that, in a thousand years, might provide archaeolog­ists with the most accurate clues as to what 21st century society was like. Inside an abandoned mine with an undergroun­d vault, a unique, longterm backup effort that so far includes artefacts as diverse as the United Nation’s Convention on the Rights of the Child, sample data acquired by the European Space Agency’s first Earth remote sensing satellite, a digitised copy of Edvard Munch’s The Scream and more than 20TB of data from open-source repository GitHub. As humankind grows curious about its ancestors, or, as seems more likely in 2020, emerges blinking from its own preservati­ve bunker to restart civilisati­on, the Arctic World Archive may be the best place to begin building an authoritat­ive history of humans – or the perfect place to figure out how to start again.

Caught on film

On a Zoom call to Oslo, Katrine Loen Thomsen holds her business card up to her webcam. Thomsen is the deputy managing director of Norwegian company Piql (you get the joke), whose technology is behind the Arctic World Archive. Her business card is made of the same stuff that could one day offer insights into our ancient civilisati­on: piqlFilm is the same size as the 35mm film used in motion picture cameras and comes in reels almost a kilometre long.

Thomsen’s piqlFilm business card has all the usual informatio­n you’d expect, but it’s the light-grey frame at the bottom that’s the company’s prize. “This frame is a very high density QR code which holds a fair amount of informatio­n,” says Patricia Alfheim, Piql’s communicat­ion manager. “It’s called nanodensit­y.”

Don’t scoff: that single 35mm frame of film packs in 8.8 million data points, translatin­g to 2MB of data per frame. “To have that much data stored, [the film] has to have a very, very clear base,” says Alfheim. That means using an ultra-clear emulsion, as noise accidental­ly recorded on the film could damage the data it holds. Each reel of film is 950m long and holds 120GB of data. Once a film is written, it’s packed safely into canisters and transporte­d to Longyearby­en, where it’s stored in Mine Number 3, an abandoned coal mine repurposed for ultralong-term data archival.

Once the door of the mine slams shut, the question becomes one of longevity. “Our film will last 750 years at least,” says Alfheim. She’s quoting tests done by Norwegian research firm Norner, whose accelerate­d testing of piqlFilm found a longevity of 750 years at

21˚C and humidity of 50%, which means a reel of film stored today would be reaching its theoretica­l expiry date in autumn 2770.

However, the chemical reactions responsibl­e for the decay of film stock slow as the temperatur­e

The Arctic World Archive may be the best place to begin building an authoritat­ive history of humans

We believe it’s worth preserving opensource software in the same way mankind has preserved the Roman Forum

drops. “So, as it gets colder, that life expectancy increases dramatical­ly,” says Alfh ei m. “The mine up in the Arctic World Archive is ideal for the film. It’s always cold.”

Or -4˚ C four year-round, to be precise, allowing Piql to make the eye-catching projection that a reel of film will last over a thousand years. This means future explorers could crack the vault open in 3020, although it’s being cautious for now. “We say 750 years because it takes a really long time to test,” says Alfhei m, “so 750 years is the mark we’ve reached. We’re still going.”

The climate isn’t the only thing in Longyearby­en’s favour. Tactically unappealin­g in the event of a war, seismologi­cally stable and, at around

115m above sea level, safe from climate change and rising seawater, there may not be anywhere on Earth more suitable for long-term storage. It’s no surprise that the Arctic World Archive is just a few hundred metres from the equally ambitious Svalbard Global Seed Vault, which holds almost a million different species of seed in an effort to safeguard against the extinction of plants in the wild. It’s the perfect place to keep something

safe for along time. The question is: what do we put there?

Keeping code cool

“Our lives depend on open-source software,” says Thomas Dohmke, vice president for strategic programs at GitHub, an unsurprisi­ng evangelist for open source. “Open source has won,” he says. “No human invention will happen without open-source software, that’s our belief, and that’s why we believe it’s worth preserving open-source software for a thousand

years, in the same way mankind has preserved the Roman Forum, the Taj Mahal, the Bodleian Library. All those artefacts of human history tell us

something about who we are and how we have developed.”

In July 2020, GitHub sent 180 reels of piqlFilm to Svalbard, which equates to more than 20TB of data.

The reels comprise a snapshot of every active public repository on GitHub, from Bitcoin to Linux. The software contained in the archive can be found everywhere in modern life from smartphone­s to smart thermostat­s.

That GitHub would want a backup is unsurprisi­ng, but pi qlFilm is forever –there’ s no updating a frame if a mistake is found or an improvemen­t made. So why wouldn’t GitHub stick with a traditiona­l backup? “I have two answers,” says Dohmke. “Our whole archive programme follows a pace la ye r approach, a concept from the Long Now Foundation.” That means backing up in layers, described by GitHub as hot, warm and cold, with each layer updated with decreasing frequency, from near real-time for the hot layer to every five years or more at the cold layer. No prizes for guessing which layer the Arctic World Archive is on.

GitHub’s backup strategy allows it to survive catastroph­es of varying magnitudes. A file is accidental­ly deleted? Access alive backup from the hot layer. Interested in what a project looked like last year? Software Heritage provides access to GitHub’s public repositori­es via a public A PI. Mankind all but wiped out in an enormous but increasing­ly plausible disaster? Simply dust off your trusty

hoverboard and head to the Arctic World Archive.

That’s answer one: the Arctic World Archive is only a layer of GitHub’s backup strategy, rather than the go-to option when a data centre crashes. And answer two? “All of us have multiple backups of all our stuff,” says Dohmke. “We all went through lo sing a CD-ROM with wedding photos, or having this old hard drive that still had someMP3s

on, and now you can no longer boot it because some cluster is damaged or

something like that. In other words, the problem with iterative backups is that the technology used to store them iterates – and deteriorat­es – as well.

“We believe that a thousand years is a significan­t enough time period [that] life will have changed… radically,” says Dohmke. “If I go back into my childhood – or even longer than that – software developmen­t went from stamp cards to my first

C64 cassette. Probably you [will] still find some museum that has aC 64 that

can load that cassette, but it’s evolving really fast, so the media that we have today – it’s safe to assume that in a thousand years all this is gone .”

The idea that migration-based backup – where backups are migrated to new storage mediums as the old ones decay or fall out of use – is not an idea endorsed by Piql. “Archives generally use a migration-based archival system,” says Piql’s Alfheim. “New formats come out all the time as old ones become obsolete – think of a floppy disk, for example.

“[However], when you migrate your data, something gets lost each time you migrate. If you look at stats on migration of informatio­n, it’s crazy risky, it’s very time consuming [and] very, very expensive. After a hundred times of migration you actually don’t know what you have.”

Piqling data

That makes Piql’s approach a doubleedge­d sword – it’s hard to update, but the media the backup is stored on and the backup itself are totally stable.

Junjie Cao, head of IT at the National Museum of Norway, sounds a similar note. “One of the problems

[is that] digital technology changes,” he says. The digital age was an unimaginab­le fantasy when Edvard Munch painted The Scream in 1893.

The Scream currently lives at the museum, where like every other unique physical arte fact in the world, it’s vulnerable to all manner of threats. The utility of backing up the painting is obvious, but the practicali­ties required a little more imaginatio­n. “It’s very important for us to have the right format,” says Cao. “The lifetime [of] a hard drive is quite short – five to ten years, [so] we are not always sure about the quality about the things we store there.”

The benefits of digitising and storing priceless physical artefacts long-term are clear. “If you have an object and it disappears, that’s the only thing you’ve got,” he says. “When we have all this data, and we have high[ resolution] pictures, that informatio­n is as important as the

painting, both for telling the world how it is and[ to avoid] bringing everyone physically to Oslo to se e it.”

The benefits of spending enormous time and energy preserving GitHub’s public repositori­es are arguably more opaque, but Dohmke disagrees. “We don’t expect somebody togo therein 50 years and restore some project, because we have those other layers,” he says. Instead, GitHub’s archives might provide our descendant­s with fascinatin­g insights into how we lived and worked.

“Technology is moving so fast , so we think in a thousand years that people will rediscover how we lived, how we collaborat­ed all around the world, how do we bridge cultures and time zones and politics in so ft ware developmen­t,” he says.

“If you go into any open-source project on GitHub, it’s not just a couple of folks from Germany and the UK,” he adds. “It’s people all around

the world speaking different languages, using their spare time, their weekends, sometimes using their profession­al time… to collaborat­e together in what we call the largest team sport on Earth.” That means that the be nefit of a long-term archive isn’t limite d to “just rebooting Linux in a thousand years… the value is more in how did we work together, how did we write the software, how did we collaborat­e?”

Just about the only thing Piql’s clients – and the company itself – are unwilling to divulge is the question of co st. Dohmke tells us the cost of archiving with Piql is “not significan­t in the business model of GitHub and Microsoft”. I politely suggest this could mean almost anything, but Piql insists straightfo­rward pricing is unavailabl­e because each project is tailored precisely to the needs of each client. “We do try to be very competitiv­e,” says Piql’s Alfhei m. And while she accepts that backing up with Piql isn’t cheap, “we thin k t here is value in longevity, and that’s what we offer that no one else can”. Reading the past

It is 3020. The newly opened door to the mysterious vault swings open. Our intrepid explorers shine their torches along the abandoned mine, walking a few hundred metres to the large fireproof container where, a thousand years ago, Norwegian photo archivist

Vidar Ibenfeldt, on behalf of the National Museum of

Norway, carefully placed a reel of film on a shelf and made a short speech about the materials to be preserved.

Carefully, the explorers remove their gloves and open the lid of the canister. A question presents itself… Now what? Back to the future

If you have data stored on a reel of piqlFilm and you want it back today, the process is simple. “At the moment, you would just request it through Piql, and we restore it and make it available online for you,” says Alfheim. “Word documents, PDFs, video files, image files, whatever you’ve got.”

Banking on the continued existence of Piql – or even Western civilisati­on – in a thousand years seems decidedl y optimistic when it comes to getting data off the reels. But piqlFilm is unique in the world of long-term data storage: unlike CDs, hard disks, reel-to-reel tape or solidstate storage, it’s human-readable.

If whoever opens the vault doesn’t have a QR reader to hand, each reel begins with a user guide that merely needs holding up to the light. “We start with a user guide, kind of like a manual to the archive, that explains what it is [and] how to use it,” says Dohmke. “We wrote this together with a panel of advisors, linguists, historians, archivists and

people from libraries, for example the Library of Alexandria in Egypt… we had them advise us of how to write this so people can actually understand.”

The human-readable user guide is written in English, Hindi, Spanish,

Arabic and Chinese, and is de si gned to give uninformed explorers of the Arctic World Archive a fighting chance at decoding their discovery.

Also available: the piqlReader. Looking like a prop from 2001: A

Space Odyssey, the reader allows a reel of film to be loaded and read by the end user. Piql has a plan if no piqlReader­s survive too. “Instructio­ns on how to build a reader are on the film,” says Alfheim. “In the distant, distant future, if there are no readers available, you can manually extract [data]. It’s a slower, manual process but you can extract all the informatio­n with just a magnifying glass, a camera of some descriptio­n and a computer that can read code.”

So future explorers might get a jump start on rebuilding their civilisati­on by restarting 21st century software? “Assuming a natural disaster happens that cuts off part of the world and all of a sudden those pieces of open-source technology are no longer available… it would definitely be possible to restore the software to that version that we deposited,” says Dohmke. However, he adds: “We are optimistic. We think our future will be bright.”

In the distant, distant future, if there are no readers available, you can manually extract data

 ??  ??
 ??  ?? 32
32
 ??  ?? ABOVE The Arctic World Archive has the same purpose as its famous neighbour, the Svalbard Global Seed Vault: protecting humanity’s inheritanc­e for future generation­s
ABOVE The Arctic World Archive has the same purpose as its famous neighbour, the Svalbard Global Seed Vault: protecting humanity’s inheritanc­e for future generation­s
 ??  ?? BELOW LEFT The Arctic World Archive is squirrelle­d away in an old Svalbard mine
BELOW The reels are placed into piqlBoxes, which have a longevity of 500 years
BELOW RIGHT Key archives such as the Vatican Library have already signed up
BELOW LEFT The Arctic World Archive is squirrelle­d away in an old Svalbard mine BELOW The reels are placed into piqlBoxes, which have a longevity of 500 years BELOW RIGHT Key archives such as the Vatican Library have already signed up
 ??  ?? BELOW LEFT Each reel of piqlFilm is 950m long and holds 120GB of data
BELOW Artworks such as The Scream have been scanned for posterity
BELOW RIGHT The bags are labelled with a QR code, the owner and the date
BELOW LEFT Each reel of piqlFilm is 950m long and holds 120GB of data BELOW Artworks such as The Scream have been scanned for posterity BELOW RIGHT The bags are labelled with a QR code, the owner and the date
 ??  ?? 36
36
 ??  ?? ABOVE LEFT The piqlFilm “user guides” can be simply held up to the light and read
ABOVE RIGHT Data is shipped from around the world, but a 1,000year life isn’t cheap
ABOVE LEFT The piqlFilm “user guides” can be simply held up to the light and read ABOVE RIGHT Data is shipped from around the world, but a 1,000year life isn’t cheap

Newspapers in English

Newspapers from United Kingdom