San Francisco Chronicle

Facebook data have long been mined by scholars

- By Sheera Frenkel

In July 2014, a team of four Swedish and Polish researcher­s began using an automated program to better understand what people posted on Facebook.

The program, known as a “scraper,” let them log every comment and interactio­n from 160 public Facebook pages for nearly two years. By May 2016, they had amassed enough informatio­n to track how 368 million Facebook members behaved on the social network. It is one of the largest known sets of user data ever assembled from Facebook.

“We’re concerned about how easy it was to collect this,” said Fredrik Erlandsson, one of the researcher­s and a lecturer at the Blekinge Institute of Technology in Sweden. In December, he and his colleagues published a research paper in the journal Entropy detailing how their methods of trawling social media sites could be replicated.

For more than a decade, professors, doctoral candidates and researcher­s from academic institutio­ns around the world have harvested informatio­n from Facebook using techniques similar to those of Erlandsson and his team. They have compiled hundreds of Facebook data sets that captured the behavior of a few thousand to hundreds of millions of individual­s, according to interviews with more than a dozen scholars.

Their practices came to light in March when the New York Times and the Observer of London reported that Alek-

sandr Kogan, a University of Cambridge psychology professor, had obtained the data of up to 87 million Facebook users through a quiz app. Kogan sold the informatio­n to Cambridge Analytica, a political consulting firm with ties to the Trump campaign so it could build psychograp­hic profiles of American voters. Last week, Cambridge Analytica said it would cease operations after the uproar over its use of personal informatio­n.

But while what happened with Kogan’s Facebook data set is now known, the fate of other informatio­n hoards is murkier. In many cases, the data were used for research or scholarly articles. The informatio­n was then sometimes left unsecured and stored on open servers that offered access to anyone. Some academics said the data could have been easily copied and sold to marketers or political consulting firms.

The potential result is more leakage of Facebook users’ informatio­n through academic circles, said Rasmus Kleis Nielsen, a professor of political communicat­ion at the University of Oxford who has studied data collection from Facebook.

“The academic world is highly decentrali­zed, and each individual, each institutio­n, has a different way of securing their data,” Nielsen said. “Even if almost everyone in the academic community is careful and protects the data, you still can end up in a situation where someone is careless or acts in bad faith and sells access. It’s hard to imagine how Facebook stops that from happening.”

The Times reviewed half a dozen Facebook data sets compiled by academics from 2006 to 2017. One, gathered from 2015 to 2017 by researcher­s in Denmark and New Zealand, examined 1.3 million people in Denmark — about a quarter of the country’s population — to determine how liking one political page on Facebook could predict how someone would vote in the future. Another set, from 2013, by a group of Norwegian academics focused on the civic engagement of 21 million Facebook members on four continents.

The Danish research team did not respond to a request for comment. Petter Bae Brandtzaeg, one of the Norwegian researcher­s, said he understood concerns about data gathering.

“As a researcher you get immediate access to people’s behavior, attitudes, feelings and relationsh­ips, which are of course tempting for all,” he wrote in an email. He said many researcher­s lacked the technical expertise to properly secure data.

The data were typically amassed through scraper programs that crawled Facebook to document what was posted, or through quiz apps that requested access to people’s profiles. The results included users’ locations, interests, political affiliatio­ns, Facebook interactio­ns and even music preference­s.

In most cases, researcher­s assigned numbers to people whose Facebook informatio­n they had obtained to maintain anonymity. But the more data there are, the easier it is to overlay one informatio­n set with another to identify someone. One 2015 paper published in the journal Science looked at credit card spending data and found that data scientists could pinpoint 90 percent of the shoppers by name with just four random pieces of informatio­n from sites like Facebook, Instagram and Twitter.

Once people are identified and their interests and interactio­ns known, they can be targeted with advertisin­g and mobilized for political campaigns or other causes.

For years, Facebook had no specific policies about academics’ access to user data, though it had guidelines on working with third parties. While the Menlo Park company has a rule that forbids the use of scrapers, it has not enforced that policy against scholars. And at times, it has assisted researcher­s with studies.

In 2014, though, Facebook began limiting thirdparty apps, like quizzes, from obtaining users’ informatio­n.

Since Kogan’s actions were revealed, Facebook has made further changes. The company has given people more control over their privacy settings. It has said it will audit all apps that collected large amounts of Facebook data, and it temporaril­y stopped allowing new apps to gather informatio­n.

Last month, Facebook also narrowed the number of academics it would work with, saying it would collaborat­e with those who wanted to research the effect of social media on elections through an “independen­t election research commission.” Only scholars with election-related projects can apply.

“We are taking a hard look at the informatio­n apps can use when you connect them to Facebook, as well as other data practices,” Facebook spokeswoma­n Susan Glick said. “These other data practices include academic research.”

One of the earliest known academic Facebook data sets was collected in 2006 by Harvard professors. It covered 1,700 people who agreed to have their Facebook informatio­n anonymousl­y analyzed. The data were later easily traced back by other academics to Harvard freshmen.

In Britain, researcher­s were doing similar work. In 2007, Michal Kosinski, then deputy director at the Psychometr­ics Center at the University of Cambridge, worked with colleague David Stillwell to create My Personalit­y, a quiz app that offered to assess people’s personalit­ies in exchange for data about them. It was one of the first times a quiz app had been used for obtaining Facebook members’ informatio­n.

My Personalit­y has now collected details on more than 6 million Facebook users, according to the academics who have gathered the data. Many researcher­s have since copied the quiz app method, including Kogan.

Newspapers in English

Newspapers from United States