The Mercury News

How did Google become the world’s largest search engine?

Regulators around the world examining ways to curb search engine’s power

- Daisuke Wakabayash­i

OAKLAND >> In 2000, just two years after it was founded, Google reached a milestone that would lay the foundation for its dominance over the next 20 years: It became the world’s largest search engine, with an index of more than 1 billion web pages.

The rest of the internet never caught up, and Google’s index just kept on getting bigger. Today, it is somewhere between 500 billion and 600 billion web pages, according to estimates.

Now, as regulators around the world examine ways to curb Google’s power, including a search monopoly case expected from state attorneys general as early as this week and the antitrust lawsuit the Justice Department filed in October, they are wrestling with a company whose sheer size has allowed it to squash competitor­s. And those competitor­s are pointing investigat­ors toward that enormous index, the gravitatio­nal center of the company.

“If people are on a search engine with a smaller index, they’re not always going to get the results they want. And then they go to Google and stay at Google,” said Matt Wells, who started Gigablast, a search engine with an index of around 5 billion web pages, about 20 years ago. “A little guy like me can’t compete.”

Understand­ing how Google’s search works is a key to figuring out why so many companies find it nearly impossible to compete and, in fact, go out of their way to cater to its needs.

Every search request provides Google with more data to make its search algorithm smarter. Google has performed so many more searches than any other search engine that it has establishe­d a huge advantage over rivals in understand­ing what consumers are looking for. That lead only continues to widen, since Google has a market share of about 90%.

Google directs billions of users to locations across the internet, and websites, hungry for that traffic, create a different set of rules for the company. Websites often provide greater and more frequent access to Google’s socalled web crawlers — computers that automatica­lly scour the internet and scan web pages — allowing the company to offer a more extensive and uptodate index of what is available on the internet.

When he was working at the music site Bandcamp, Zack Maril, a software engineer, became concerned about how Google’s dominance had made it so essential to websites.

In 2018, when Google said its crawler, Googlebot, was having trouble with one of Bandcamp’s pages, Maril made fixing the problem a priority because Google was critical to the site’s traffic. When other crawlers encountere­d problems, Bandcamp would usually block them.

Maril continued to research the different ways that websites opened doors for Google and closed them for others. Last year, he sent a 20-page report, “Understand­ing Google,” to a House antitrust subcommitt­ee and then met with investigat­ors to explain why other companies could not recreate Google’s index.

“It’s largely an unchecked source of power for its monopoly,” said Maril, 29, who works at another technology company that does not compete directly with Google. He asked that The New York Times not identify his employer since he was not speaking for it.

A report this year by the House subcommitt­ee cited Maril’s research on Google’s efforts to create a realtime map of the internet and how this had “locked in its dominance.” While the Justice Department is looking to unwind Google’s business deals that put its search engine front and center on billions of smartphone­s and computers, Maril is urging the government to intervene and regulate Google’s index. A Google spokespers­on declined to comment.

Websites and search engines are symbiotic. Websites rely on search engines for traffic, while search engines need access to crawl the sites to provide relevant results for users. But each crawler puts a strain on a website’s resources in server and bandwidth costs, and some aggressive crawlers resemble security risks that can take down a site.

Since having their pages crawled costs money, websites have an incentive to let it be done only by search engines that direct enough traffic to them. In the current world of search, that leaves Google and — in some cases — Microsoft’s Bing.

Google and Microsoft are the only search engines that spend hundreds of millions of dollars annually to maintain a real-time map of the English-language internet. That’s in addition to the billions they have spent over the years to build out their indexes, according to a report this summer from Britain’s Competitio­n and Markets Authority.

Google holds a significan­t leg up on Microsoft in more than market share. British competitio­n authoritie­s said Google’s index included about 500 billion to 600 billion web pages, compared with 100 billion to 200 billion for Microsoft.

Other large tech companies deploy crawlers for other purposes. Facebook has a crawler for links that appear on its site or services. Amazon says its crawler helps improve its voice-based assistant, Alexa. Apple has its own crawler, Applebot, which has fueled speculatio­n that it might be looking to build its own search engine.

But indexing has always been a challenge for companies without deep pockets.The privacy-minded search engine DuckDuckGo decided to stop crawling the entire web more than a decade ago and now syndicates results from Microsoft. It still crawls sites like Wikipedia to provide results for answer boxes that appear in its results, but maintainin­g its own index does not usually make financial sense for the company.

“It costs more money than we can afford,” said Gabriel Weinberg, chief executive of DuckDuckGo. In a written statement for the House antitrust subcommitt­ee last year, the company said that “an aspiring search engine startup today (and in the foreseeabl­e future) cannot avoid the need” to turn to Microsoft or Google for its search results.

 ??  ??
 ?? JARED SOARES — NEW YORK TIMES ?? Zack Maril, a software engineer, demonstrat­es his website that looks into web crawling, in Washington on Nov. 13. Maril explained to investigat­ors how Google’s index gave it so much power.
JARED SOARES — NEW YORK TIMES Zack Maril, a software engineer, demonstrat­es his website that looks into web crawling, in Washington on Nov. 13. Maril explained to investigat­ors how Google’s index gave it so much power.
 ?? JARED SOARES — THE NEW YORK TIMES ?? Zack Maril, a software engineer, in Washington on Nov. 13. Maril explained to investigat­ors how Google’s index gave it so much power.
JARED SOARES — THE NEW YORK TIMES Zack Maril, a software engineer, in Washington on Nov. 13. Maril explained to investigat­ors how Google’s index gave it so much power.

Newspapers in English

Newspapers from United States