Another Data Privacy Debacle

2018-02-12 - Zeynep Tufekci is an associate professor at the University of North Carolina and author of “Twitter and Tear Gas: The Power and Fragility of Networked Protest.” Send comments to intelligence@nytimes.com.

Did you make a New Year’s resolution to exercise more? Perhaps you downloaded a fitness app to help track your workouts, maybe one that allows you to share that data online with your exercise buddies?

If so, you probably checked a box to accept the app’s privacy policy. For most apps, the default setting is to share data with at least the company; for many apps the default is to share data with the public. But you probably didn’t even notice or care. After all, what do you have to hide?

For users of the exercise app Strava, the answer turns out to be a lot more than they realized. Since November, Strava has featured a global “heat map” showing where its users jogged or walked or otherwise traveled while the app was on. The map includes some three trillion GPS data points, covering more than 5 percent of the earth. Recently, security analysts showed that because many American military service members are Strava users, the map inadvertently reveals the locations of military bases and the movements of their personnel.

Perhaps more alarming for the military, similar patterns of movement appear to possibly identify stations or airstrips in locations where the United States is not known to have such operations, as well as their supply and logistics routes. Analysts noted that with Strava’s interface, it is relatively easy to identify the movements of individual soldiers not just abroad but also when they are back at home, especially if combined with other public or social media data.

Apart from chastening the cybersecurity experts in the Defense Department, the Strava debacle underscores a crucial misconception at the heart of the system of privacy protection in the United States. The privacy of data cannot be managed person-by-person through a system of individualized informed consent.

Data privacy is not like a consumer good, where you click “I accept” and all is well. Data privacy is more like air quality or safe drinking water, a public good that cannot be effectively regulated by trusting in the wisdom of millions of individual choices. A more collective response is needed.

Part of the problem with the ideal of individualized informed consent is that it assumes companies have the ability to inform us about the risks we are consenting to. They don’t. Strava surely did not intend to reveal the GPS coordinates of a possible Central Intelligence Agency annex in Mogadishu, Somalia — but it may have done just that. Even if all technology companies meant well and acted in good faith, they would not be in a position to let you know what exactly you were signing up for.

Another part of the problem is the increasingly powerful computational methods called machine learning, which can take seemingly inconsequential data about you and, combining them with other data, can discover facts about you that you never intended to reveal. For example, research shows that data as minor as your Facebook “likes” can be used to infer your sexual orientation, whether you use addictive substances, your race and your views on many political issues. This kind of computational statistical inference is not 100 percent accurate, but it can be fairly close — certainly close enough to be used to profile you for a variety of purposes.

A challenging feature of machine learning is that exactly how a given system works is opaque. Nobody — not even those who have access to the code and data — can tell what piece of data came together with what other piece of data to result in the finding the program made. This further undermines the notion of informed consent, as we do not know which data result in what privacy consequences. What we do know is that these algorithms work better the more data they have. This creates an incentive for companies to collect and store as much data as possible, and to bury the privacy ramifications, either in legalese or by playing dumb and being vague.

What can be done? There must be strict controls and regulations con- cerning how all the data about us — not just the obviously sensitive bits — is collected, stored and sold. With the implications of our current data practices unknown, and with future uses of our data unknowable, data storage must move from being the default procedure to a step that is taken only when it is of demonstrable benefit to the user, with explicit consent and with clear warnings about what the company does and does not know. And there should also be significant penalties for data breaches, especially ones that result from underinvestment in secure data practices, as many now do.

Companies often argue that privacy is what we sacrifice for the supercomputers in our pockets and their highly personalized services. This is not true. While a perfect system with no trade- offs may not exist, there are technological avenues that remain underexplored, or even actively resisted by big companies, that could allow many of the advantages of the digital world without this kind of senseless assault on our privacy.

With luck, stricter regulations and a true consumer backlash will force our technological overlords to take this issue seriously and let us take back what should be ours: true and meaningful informed consent, and the right to be let alone.

Another Data Privacy Debacle

Newspapers in German

Newspapers from Austria