Early Access
New “differential privacy” technique explained
Pre-release software and “differential privacy.”
At WWDC, Apple debuted several new features – such as image identification in Photos and better predictive features for typing and Siri suggestions – that require collecting large amounts of user data to do well. But it promised that it would still protect users’ anonymity and privacy, using a littleknown technique known as “differential privacy.”
This technique stops Apple (or anyone with access to the data) identifying you based on the data, by making each individual piece of data unreliable. Apple doesn’t need to look at data for each user for its purposes (knowing whether more people choose suggestion x over suggestion y in Maps, for example), only the average data for all users. So the data it collects could include intentionally false results uploaded by iOS, for example – as long as Apple knows the statistics of how much false data is included, it can adjust the averages to compensate for it, and still know roughly how many people are choosing x over y. But if you look at any individual piece of data, you have no way of knowing if it’s false.
That’s one way differential privacy can work, and is admittedly an oversimplication – Apple hasn’t detailed its exact methods, but has said that it includes turning data into hashed obfuscated text strings, and that it adds “noise” (extra junk data), as well as a “privacy budget” meaning no one person has too much data collected about them.
This is a huge step forward for privacy, and we’re delighted to see Apple leading the way.