Hindustan Times (Chandigarh)

Digital archives keep track of lost websites, lapsed domain names from the early years

- Rachel Lopez

MUMBAI: What do you remember of the first time you used the internet? The screee-beep of the modem connecting? The joy of seeing little computer icons linking to each other in your taskbar, indicating that your dial-up was working? Or paying the cyber café guy ~10 to book your train ticket?

When India first got online 25 years ago, it was slow going. Pages took ages to load (with the hourglass icon turning endlessly). But once we got comfortabl­e, we were hooked. You filled your Muzik folder with MP3 files, saw the Msn-yahoo rivalry unfold like a repeat of the ’80s cola wars, did a web search through Ask Jeeves. And it was all free, and ad-free.

Users got their first taste of power in 1994, with public-journal formats that allowed anyone to post their thoughts for the world. We didn’t even call them blogs until 1999, but by 2004, Blog was Oxford English Dictionary’s Word of the Year.

The internet’s first-ever website, Info.cern.ch, has been saved for posterity by the guys who built the internet. But where do you go to see the rest of it? When domain names lapse, or companies collapse, the websites can vanish without a trace. You can’t pull an old one from the shelf. That’s probably why The Internet Archive is such a gem.

HIT REWIND

Born 24 years ago — just a year after India connected to the Internet — The Internet Archive is an American non-profit organisati­on that digitises films, books, letters, images, audio, video and software programmes. Its key project, The Wayback Machine trawls the web, copying pages, to build a library of internet itself.

The Machine lets you see what a page looked like when it was archived, even if the site has changed or been taken down. More than 458 billion pages have been saved so far. It’s not nearly enough. The internet has more than 60 trillion web pages. And with social media, there’s more to archive than ever.

Meanwhile, not everyone’s thrilled about record-keeping. Because the Wayback Machine collects site caches without asking, it’s raised questions about copyright infringeme­nt and privacy. In 2017, Internet Archive was among 2,600 sites banned by the Indian government as part of the fight against digital piracy. And in the coronaviru­s disease lockdown, publishers sued the non-profit for making its digitised library available globally.

PIECES OF THE PAST

Symbolics.com, the first domain name ever registered on the internet, was already 10 years old when India logged on in 1995. It used to be a computer programmin­g business. But it is now the Big Internet Museum.

Some government­s are taking big steps too. In Sweden, every web page that ends in .se has been saved by the National Library’s Web Archive division. The British Library has archived 6 billion pages.

India doesn’t have such an archive, except for election-related data. But the Wayback Machine has been paying attention to India almost from the start. There are more than 34,000 captures for just the Hindustan Times site, for instance, some dating back to 2001.

Newspapers in English

Newspapers from India