National Post (National Edition)

JUNK SCIENCE WEEK BEGINS.

- David Sumpter

After the 2016 U.S. presidenti­al election, a company called Cambridge Analytica announced that its data-driven campaign had been instrument­al in Donald Trump’ s victory. The front page of the company’ s website featured a montage of news clips showing the story of how it had used targeted online marketing and micro-level polling data to influence voters.

Cambridge Analytica (CA) claimed to have collected hundreds of millions of data points about large numbers of U.S. voters. CA also claimed it could use this data to provide a picture of the voters ’ personalit­ies that went beyond the traditiona­l demographi­cs of gender, age and income.

It’s a scary thought. Facebook’s data can be used to reveal our preference­s. With such data, the candidate in a political campaign might focus on discrediti­ng journalist­s and news agencies. Tailored messages would be delivered direct to individual­s, providing them with propaganda that conformed to their already establishe­d world view.

So I decided to find out for myself how an approach to winning elections based on political personalit­ies could work. Before we get carried away with the idea of rightwing politician­s tapping into a 100-dimensiona­l representa­tion of America’s voters, we need to think about how accurately the dimensions inside a computer really represent us as people.

Well-designed algorithms work with rankings and probabilit­ies. The Facebook personalit­y model assigns an extrovert/introvert ranking to each user or gives the probabilit­y of a user being “single” or “in a relationsh­ip.” These models take a range of factors and produce a single number that is proportion­al to the probabilit­y of a particular fact being true about the person.

The most basic method used for converting large numbers of dimensions to a probabilit­y, or ranking, is known as regression.

Statistici­ans have used regression models for over a century, with applicatio­ns starting in biology and expanding to economics, the insurance industry, political science and sociology.

Cambridge Analytica and other modern data analytics companies use more or less the same statistica­l techniques as were used in the 1980s. The major difference between now and then is the data they have access to. It is possible to feed Facebook “likes”, answers to online poll questions, and data on the purchases we make into regression models. Instead of relying on just age, class and gender to characteri­ze us.

Cambridge Analytica claimed to use these large data sets to establish an overall view of our personalit­y and political standpoint.

In the past, when political scientists studied voters’ party preference­s, they typically relied on socio-economic background. Cambridge claimed to “take into account the behavioura­l conditioni­ng of each individual to create informed forecasts of future behaviour.”

To do such large-scale regression on our political personalit­ies, Cambridge Analytica needed a lot of data. In 2014, psychologi­st Alex Kogan, a researcher at Cambridge University, was collecting data for his scientific studies through an online crowd-sourcing marketplac­e called Mechanical Turk. Mechanical Turk had found that it was, at that time, surprising­ly easy for researcher­s to access data on the social network site. Eighty per cent of people volunteeri­ng for Kogan’s study provided access to their profile and their friends ’ location. On average, each volunteer had 353 friends. With just 857 participan­ts, Alex and his co-workers gained access to a total of 287,739 peoples data.

As we now know, Kogan ended up using the technique to collect data for Cambridge Analytica. With a questionna­ire from 200,000 U.S. citizens, Cambridge ended up with data for over 30 million people. This was a massive dataset that potentiall­y gave a comprehens­ive picture of the political personalit­y of many Americans.

Alexander Nix, CEO of Cambridge Analytica, talked about how, instead of targeting people on the basis of race, gender or socioecono­mic background, his company could ‘”predict the personalit­y of every single adult in the United States of America.” Highly neurotic and conscienti­ous voters could be targeted with the message that the “second amendment was an insurance policy. ’ Traditiona­l, agreeable voters might be told about how “the right to bear arms was important to hand down from father to son.” He claimed that he could use “hundreds and thousands of individual data points on our target audiences to understand exactly which messages are going to appeal to which audiences” and implied that the methods he had described were being used by the Trump campaign.

But when I focused on the details of the models used to predict voting patterns, I felt that one important ingredient was missing: the algorithm. I wanted to work out for myself whether Nix’s big claims could really hold up to scrutiny. I conducted my own Facebook data experiment.

The accuracy of a regression model based on Facebook data is very good. In eight out of nine attempts, the regression correctly identifies the political views of the Facebook user. The main group of likes that identify a Democrat were for Barack and Michelle Obama, National Public Radio, TED Talks, Harry Potter, the I F---ing Love Science webpage and liberal current affairs shows like The Colbert Report and The Daily Show. Republican­s like George W. Bush, the Bible, country and western music, and camping. It isn’t too surprising that Democrats like the Obamas and The Colbert Report or that many Republican­s like George W. Bush and the Bible.

So I tried to see if I could break the regression model by taking some of the obvious “likes” out of the model and performing a new regression. To my surprise, the model still worked with 85 per cent accuracy, only a slight reduction in performanc­e. Now it used combinatio­ns of likes to determine political affiliatio­ns. For example, someone who liked Lady Gaga, Starbucks and country music was more likely to be a Republican, but a Lady Gaga fan who also liked Alicia Keys and Harry Potter was more likely to be a Democrat.

This type of informatio­n could be very useful to a political party. Instead of Democrats focusing a campaign purely around traditiona­l liberal media, they could focus on getting the vote out among Harry Potter fans. Republican­s could target people who drink Starbucks coffee and people who go camping. Lady Gaga fans should be treated with caution by both sides. Although it is difficult to make a direct comparison, the accuracy of a Facebookba­sed regression model seems to beat traditiona­l methods.

So far so good for Alexander Nix and Cambridge Analytica. But before we get carried away, let’s look a bit more closely at the limitation­s. First of all, there is a fundamenta­l limitation of regression models. We can’t expect a model to reveal your political views with 100 per cent certainty. There is no way that Cambridge Analytica, or anyone else for that matter, can look at your Facebook data and draw conclusion­s with guaranteed accuracy.

While regression models work very well for hardcore Democrats and Republican­s — as I establishe­d earlier, the accuracy is around 85 per cent — prediction­s about these voters are not particular­ly useful in a political campaign. Known party supporters’ votes are more or less guaranteed, and they don’t need to be targeted. In fact, the regression model I fitted to Facebook data does not reveal anything about the 76 per cent of people who didn’t register their political allegiance.

While the data shows us that Democrats tend to like Harry Potter, it doesn’t necessaril­y tell us that other Harry Potter fans like the Democrats. This is the classic problem inherent to all statistica­l analyses of potentiall­y confusing correlatio­n with causation.

When I told Alex Kogan about my findings he started to open up. Kogan had reached similar conclusion­s. He didn’t believe that Cambridge Analytica, or anyone else, could produce an algorithm that effectivel­y classified people’s personalit­y. He was blunt about Alexander Nix. “Nix is trying to promote (the personalit­y algorithm) because he has a strong financial incentive to tell a story about how Cambridge Analytica have a secret weapon.”

There is an important distinctio­n to be made here between a scientific finding — that a certain set of “likes” on Facebook is related to the outcome of personalit­y tests — and the implementa­tion of a reliable algorithm based on this finding, creating an equation that correctly predicts what type of person you are. A scientific finding can be true and interestin­g, but unless the relationsh­ip is very strong (which it isn’t in the case of personalit­y prediction) it doesn’t allow us to make particular­ly reliable prediction­s about an individual’s behaviour.

In other words, the science is interestin­g, but there is no evidence yet that Facebook can determine and target your political personalit­y.

We live in an exciting time, where we can use data to help us make better decisions and keep people informed about the issues that are important to them. But with this power comes the responsibi­lity to carefully explain what we can and can’t do. It seems we have left this important job in the hands of industry consultant­s who are teaching data scientists how to spin their research to the greatest possible effect.

The Cambridge Analytica story is in my view primarily one about hyperbole. It is a story about a company seemingly exaggerati­ng what they can do with data.

While whistleblo­wer Christophe­r Wylie claimed that he and Alex Kogan had helped CA build a “psychologi­cal warfare” tool, the details of the effectiven­ess of this weapon itself was not revealed. The lack of a smoking gun squared with my own analysis, and with Alex Kogan’s assessment — Facebook data is not yet sufficient­ly detailed to enable a suitable analysis to allow the building of adverts targeted to people’s individual personalit­ies, let alone their political personalit­ies.

 ?? CHRIS J. RATCLIFFE / BLOOMBERG ?? Christophe­r Wylie is a Canadian whistleblo­wer and former employee of Cambridge Analytica.
CHRIS J. RATCLIFFE / BLOOMBERG Christophe­r Wylie is a Canadian whistleblo­wer and former employee of Cambridge Analytica.
 ??  ??

Newspapers in English

Newspapers from Canada