When size isn’t everything

Big Data is not all it’s cracked up to be when we go looking for sense in a sea of gibberish

2017-03-19 - Robert Matthews is visiting professor of science at Aston University, Birmingham, UK

It’s what every investor has dreamed of: a way of predicting stock market moves. And according to researchers in the United Kingdom, it’s easier than you would think. Changes in the US Dow Jones index are presaged by the ebb and flow of Google searches for certain financial keywords.

Better still, acting on these trends could make you a huge profit – at least, until others find the same keywords, of course.

Predictably, publication of this claim in the journal Nature Scientific Reports attracted a lot of attention.

It was also hailed as evidence of the power of so-called Big Data: the extraction of insight from data sets.

Now, it looks set to become a classic example of what Big Data can do – but not in a good way.

An analysis of the claim published this month suggests it is an object lesson in how digging into Big Data can trigger an avalanche of dross.

With everyone, from politicians to billion-dollar corporations, looking to Big Data for answers to big questions, it holds vital lessons for all of us.

And topping the list is never let go of common sense. When it first began to make headlines around a decade ago, the exploitation of Big Data was hailed as nothing short of a revolution in how we can make sense of the world. In the face of so much information, there was simply no reason to bother dreaming up intricate theories and then testing them.

“Out with every theory of human behaviour, from linguistics to sociology. Forget taxonomy, ontology and psychology,” wrote Chris Anderson, editor-in-chief of Wired. “With enough data, the numbers speak for themselves.”

At last month’s World Government Summit in Dubai, the potential of Big Data was held up as the means to solve a host of problems, from delivering local public services to tackling global food security. But now, many of those who have looked to Big Data for insight have found that when left to “speak for themselves”, numbers often spout gibberish.

Take that claim that Google search data can predict stock market moves. The idea isn’t so crazy: market swings reflect changes in sentiment among investors, and these might well show up as changes in the volume of searches for certain financial terms.

But which ones? In their Nature Science Reports study, the researchers came up with a list of about 100 terms they thought might pop up, like “risk” and “investment”.

They then set computers digging into Google’s databases, looking for links between search volumes for each term and changes in the Dow Jones index. And it worked. The computer found correlations between the two. But the researchers went further, showing that the best correlation could be turned into hard cash.

Their results showed that a trading strategy of buying when the word “debt” is trending upwards and vice versa would have generated a profit of more than 300 per cent compared to a simple buy-and-hold strategy.

But the key phrase there is “would have”. The researchers did not make any money themselves – not because they were above such things but because they would have needed a time machine. That’s because all their results were retrospective, revealing only past correlations.

And that’s where the whole idea unravels.

By running a list of 100 terms against past records of the Dow Jones, the researchers risked falling into a classic Big Data trap: finding correlations that are meaningless flukes.

Such correlations are shockingly common, as former Harvard law student Tyler Vigen has shown through his now- celebrated eponymous website.

It contains countless silly but impressively strong correlations found by automatically scouring the web. For example, Vigen’s computers found that the annual amount of honey produced in the United States is very strongly correlated to the number of murders using blunt objects.

Of course, no one with any common sense would fall for such “links”. But that’s the problem with using computers to find them: common sense does not come as standard.

In theory, genuine links might exist between search terms like “debt” and stock market changes. But the only way to find out is to check their predictive power. Dr Wai Mun Fong, associate professor of finance at the National University of Singapore, has now done precisely that, in research published in the current Journal of Index Investing.

Dr Fong took the same list of about 100 terms used in the original study but this time tested their predictive power in years not examined by the original researchers.

Search terms that proved most strongly correlated with the Dow Jones in one year were used to predict market behaviour in the following year. Sure enough, Dr Fong confirmed that some Google search terms were correlated with market moves. But they proved to be “society”, “cancer”, “home” and various others with pretty tenuous links to investor sentiment. Significantly, one search term that failed to predict the market over this new timescale was “debt”. Dr Fong’s conclusions are blunt: using Big Data to predict stock prices that aren’t based on solid economic theory or properly tested “are doomed to be misguided and futile”. That such basic warnings are still necessary says much about the spell Big Data is weaving over smart people.

Even Google, that titan of Big Data, has proved vulnerable. In 2008 it unveiled Google Flu Trends ( GFT), with the aim of helping predict flu epidemics through spikes in certain search terms.

To find these keywords, computers had looked for links between 50 million candidates and just 1,200 flu outbreak data points. So big a mismatch is a sure-fire way of finding spurious correlations.

And thus it proved. Like the stock market predictor, GFT made headlines, then mistakes and, ultimately, proved useless. Google threw in the towel in 2015.

Now others are going down the same path. The latest victims of Big Data are hedge funds, who have ploughed huge sums into finding patterns that give them an edge.

One industry insider recently told Bloomberg that the “insights” they’re finding in Big Data have a failure rate of about 90 per cent.

Few can resist surveying the towering masses of global data without believing the prospector’s credo “there’s gold in them thar hills”. There is, but it’s buried deep in dross – and simply grabbing a shovel and digging is not the way to find it.

‘ Out with every theory of human behaviour, from linguistics to sociology. With enough data, the numbers speak for themselves Chris Anderson Editor-in-chief of Wired

?? Michael Nagle / Bloomberg Researchers have found that scouring the internet for winning streaks is not guaranteed to deliver a dividend. ?? — Michael Nagle / Bloomberg Researchers have found that scouring the internet for winning streaks is not guaranteed to deliver a dividend.

When size isn’t everything

Big Data is not all it’s cracked up to be when we go looking for sense in a sea of gibberish

Newspapers in English

Newspapers from United Arab Emirates