National Post

THE WAYBACK MACHINE

WHEN WEB PAGES CAN CHANGE WITH A CLICK, THIS COMPANY SAFEGUARDS OUR DIGITAL HISTORY

- Margi Murphy

OVER THE PAST FEW DECADES, ALMOST ALL OF HUMAN COMMUNICAT­ION HAS BEEN DIGITAL. AND WHILE THAT HAS AFFORDED A DRAMATIC INCREASE IN THE VOLUME AND THE FREQUENCY, IT HAS ALSO BROUGHT WITH IT A FRAGILITY.

— MARK GRAHAM

Mark Graham fears that valuable pieces of history are being wiped out before our eyes.

As the director of the Wayback Machine, a website that records how individual web pages have changed over time, he’s acutely aware of how important it is to keep a record of what’s being posted — and where.

He has seen changes ranging from benign spelling correction­s to edits in government websites to the dismantlin­g of media outlets by dictators.

“If we want future generation­s to have the opportunit­y to learn from history, then it is imperative that history be available to them,” Graham says. “Over the past few decades, almost all of human communicat­ion has been digital. And while that has afforded a dramatic increase in the volume and the frequency, it has also brought with it a fragility.”

The Wayback Machine, which stores 525 billion web pages, does exactly what its name suggests. It’s a time machine for the web. Without it, entire web pages documentin­g our collective history can be wiped out with a single click.

“Most societies place importance on preserving artifacts of their culture and heritage,” the company says. “Without such artifacts, civilizati­on has no memory and no mechanism to learn from its successes and failures. Our culture now produces more and more artifacts in digital form.”

The Wayback Machine aims to preserve those artifacts and create an internet library for researcher­s.

More recently, its parent company Internet Archive has been thrust into the spotlight for archiving Parler, a social network that’s been brought off-line, and has been accused of hosting posts used to plan the insurrecti­on at the u.s. Capitol on Jan 6. Amazon Web Services terminated Parler’s web hosting contract for failing to adequately moderate violent content. Its record on the Internet Archive helped digital sleuths track down users who had shared photos of the riots and insurrecti­on.

but while the digital trail from the riots has certainly been important for investigat­ors hunting for links to u.s. white supremacis­ts, Graham argues it is critical to archive as much of the web as possible.

Companies and individual­s have become less concerned with hosting their own services and are instead using third parties such as Microsoft, Amazon, Google or IBM. domain registrati­ons are also provided by private companies like Godaddy and blue-host. If they terminate contracts — whether willingly or not — those websites could simply vanish.

More pertinent is the lack of transparen­cy over how pages are edited, changed or deleted. The average life expectancy of a web page is about 100 days before it is changed or deleted, Graham says.

As Joe biden was inaugurate­d as u.s. president, for example, u.s. government websites received a total overhaul, with no public change log. The White House archives its website material, but links expire, making it difficult to navigate the additional sources it uses.

Graham thinks it is reasonable for web pages to experience minor alteration­s to correct a mistake. “but what about a material erasure like when there was a failed coup in Turkey where 150 media organizati­ons were taken down by the government?”

In the 1930s, Kremlin propagandi­sts would carefully erase historic pictures of discredite­d leaders from photos where they were pictured standing alongside Josef Stalin. These days, modifying records is far easier.

Internet Archive founder brewster Kahle began archiving web pages in 1996. It has more than 28 million books and texts, six million films and videos and 600,000 software programs.

In 2001, it launched the Wayback Machine (named for the time machine Mr. Peabody the dog and his boy Sherman travel with in segments from The rocky & bullwinkle cartoons of the 1960s), which Graham took over as director in 2015. It is funded by a mixture of government and private grants. These include Alexa internet, a web traffic analysis company owned by Amazon, as well as the u.s. Library for Congress.

The public can upload material, but the internet Archive has its own web crawlers that work to preserve as much as possible. The fragility of the web has spurred an industry of digital accountabi­lity. Web-tracking companies like Wachete and Visualping send email alerts to customers letting them know if any element of a webpage has changed.

This can be useful for reasons like a job hunter monitoring a company’s recruitmen­t page for new roles, or someone monitoring uber’s stock price. Journalist­s and investigat­ors often use it in tracking edits or u-turns.

youtube’s removal of several donald Trump videos and Twitter’s wiping of his account has sparked a debate similar to whether it is right to remove offensive characters from history books or town squares.

It has become common for verified account holders on Twitter to delete their tweets, something that once seemed unorthodox — almost like an admittance of guilt. but now, keeping tweets up after clarifying a point or apologizin­g after finding new informatio­n is controvers­ial.

In a similar way, people who wish to remove unflatteri­ng pieces about themselves from the web can request Google remove pages under a law in europe that grants people the right to be forgotten. Publicity companies offer services cleaning up clients’ digital presence and editing Wikipedia pages.

As efforts to scrub the web of uncomforta­ble truths become more intense, internet Archive will become more important than ever. “We take it for granted that digital material is there and it’s not going to go away,” Graham says. “but the reality is anything but.”

 ?? JEWEL SAMAD / AFP / GETTY IMAGES ?? Google is one of several third-party companies that host websites for companies and individual­s — potentiall­y
putting that material at risk of loss should Google and other hosts terminate those contracts.
JEWEL SAMAD / AFP / GETTY IMAGES Google is one of several third-party companies that host websites for companies and individual­s — potentiall­y putting that material at risk of loss should Google and other hosts terminate those contracts.
 ?? Roberto SCHMIDT / AFP/GETTY IMAGES ?? Since the Jan. 6 insurrecti­on on Capitol Hill, digital archives and web-tracking companies have been helping investigat­ors make arrests.
Roberto SCHMIDT / AFP/GETTY IMAGES Since the Jan. 6 insurrecti­on on Capitol Hill, digital archives and web-tracking companies have been helping investigat­ors make arrests.

Newspapers in English

Newspapers from Canada