Cen­sus Bu­reau seeks ways to pro­tect pri­vacy

Agency re­con­sid­ers whether pro­tec­tion it of­fers is strong enough

The Dallas Morning News - - Nation&world - Mark Hansen, The New York Times

When the Cen­sus Bu­reau gath­ered data in 2010, it made two prom­ises. The form would be “quick and easy,” it said. And “your an­swers are pro­tected by law.”

But math­e­mat­i­cal break­throughs, easy ac­cess to more pow­er­ful com­put­ing, and wide­spread avail­abil­ity of large and var­ied pub­lic data sets have made the bu­reau re­con­sider whether the pro­tec­tion it of­fers Amer­i­cans is strong enough. To pre­serve con­fi­den­tial­ity, the bu­reau’s di­rec­tors have de­ter­mined they need to adopt a “for­mal pri­vacy” ap­proach, one that adds uncer­tainty to cen­sus data be­fore it is pub­lished and achieves pri­vacy as­sur­ances that are prov­able math­e­mat­i­cally.

The cen­sus has al­ways added some uncer­tainty to its data, but a key in­no­va­tion of this new frame­work, known as “dif­fer­en­tial pri­vacy,” is a nu­mer­i­cal value de­scrib­ing how much pri­vacy loss a per­son will ex­pe­ri­ence. It de­ter­mines the amount of ran­dom­ness — “noise” — that needs to be added to a data set be­fore it is re­leased, and sets up a bal­anc­ing act be­tween ac­cu­racy and pri­vacy. Too much noise would mean the data would not be ac­cu­rate enough to be use­ful — in re­dis­trict­ing, in en­forc­ing the Vot­ing Rights Act or in con­duct­ing aca­demic re­search. But too lit­tle, and some­one’s per­sonal data could be re­vealed.

On Thurs­day, the bu­reau will an­nounce the trade­off it has cho­sen for data pub­li­ca­tions from the 2018 End­to­end Cen­sus Test it con­ducted in Rhode Is­land, the only dress re­hearsal be­fore the ac­tual cen­sus in 2020. The bu­reau has de­cided to en­force stronger pri­vacy pro­tec­tions than com­pa­nies like Ap­ple or Google had when they each first took up dif­fer­en­tial pri­vacy.

Hun­dreds of ta­bles

Cyn­thia Dwork, the Gor­don Mckay Pro­fes­sor of Com­puter Science at Har­vard and one of the in­ven­tors of dif­fer­en­tial pri­vacy, says it is “tai­lored to the sta­tis­ti­cal anal­y­sis of large data sets” — pre­cisely the is­sue fac­ing the cen­sus with its man­date from Ti­tle 13 of the U.S. Code to keep each per­son’s in­for­ma­tion pri­vate, and its re­spon­si­bil­ity to pro­vide use­ful data.

At the root of the prob­lem are the ta­bles of ag­gre­gate statis­tics the bu­reau pub­lishes. There are hun­dreds of ta­bles — sex by age, say, or eth­nic­ity by race — sum­ma­riz­ing the pop­u­la­tion at sev­eral lev­els of ge­og­ra­phy, from ar­eas the size of a city block all the way up to the level of a state or the na­tion. In 2010, the bu­reau re­leased ta­bles with nearly 8 bil­lion num­bers. That was about 25 num­bers for each per­son liv­ing in the United States, even though Amer­i­cans were asked only 10 ques­tions about them­selves. In other words, the ta­bles were gen­er­ated in so many ways the Cen­sus Bu­reau ended up re­leas­ing more data in ag­gre­gate then it had col­lected in the first place.

For the cen­sus, this is par­tic­u­larly wor­ri­some, es­pe­cially if a ques­tion about cit­i­zen­ship is added to the 2020 cen­sus, as the Trump ad­min­is­tra­tion has pro­posed. “I think it is crys­tal clear what the po­ten­tial harm is from poorly pro­tected tab­u­lar sum­maries,” said John Abowd, as­so­ciate di­rec­tor for re­search and method­ol­ogy at the Cen­sus Bu­reau, who be­came an early pro­po­nent of dif­fer­en­tial pri­vacy.

In No­vem­ber 2016, the bu­reau staged some­thing of an at­tack on it­self. Us­ing only the sum­mary ta­bles with their 8 bil­lion num­bers, Abowd formed a small team to try to gen­er­ate a record for ev­ery Amer­i­can that would show the block where he or she lived, as well as his or her sex, age, race and eth­nic­ity — a “re­con­struc­tion” of the per­son-level data.

Each statis­tic in a sum­mary ta­ble leaks a lit­tle in­for­ma­tion, of­fer­ing clues about, or rather con­straints on, what re­spon­dents’ an­swers to the cen­sus could look like. Com­bin­ing statis­tics from dif­fer­ent ag­gre­gate ta­bles at dif­fer­ent lev­els of ge­og­ra­phy, we start to get a pic­ture of the de­mo­graph­ics of who is liv­ing where.

Alarm­ing ac­cu­racy

By this sum­mer, Abowd and his team had com­pleted their re­con­struc­tion for nearly ev­ery part of the coun­try. When they matched their re­con­structed data to the ac­tual, con­fi­den­tial records — again com­par­ing just block, sex, age, race and eth­nic­ity — they found about 50 per­cent of peo­ple matched ex­actly. And for more than 90 per­cent there was at most one mis­take, typ­i­cally a per­son’s age be­ing missed by one or two years. (At smaller lev­els of ge­og­ra­phy, the cen­sus re­ports age in five­year buck­ets.)

This level of ac­cu­racy was alarm­ing. Abowd and his peers say their re­con­struc­tion, while still pre­lim­i­nary, is not a vi­o­la­tion of Ti­tle 13. In­stead it is seen as a red flag that their cur­rent dis­clo­sure lim­i­ta­tion sys­tem is out of date.

The bu­reau has long had pro­ce­dures to pro­tect re­spon­dents’ con­fi­den­tial­ity. For ex­am­ple, cen­sus data from 2010 showed that a sin­gle Asian cou­ple — a 63­year­old man and a 58­year­old woman — lived on Lib­erty Is­land, at the base of the Statue of Lib­erty.

That was news to David Luchsinger, who had taken the job as the su­per­in­ten­dent for the na­tional mon­u­ment the year be­fore. On Cen­sus Day in 2010, Luchsinger was 59, and his wife, De­bra, was 49. In an in­ter­view, they said they had iden­ti­fied as white on the ques­tion­naire, and they were the is­land’s real oc­cu­pants.

Be­fore re­leas­ing its data, the bu­reau had “swapped” the Luchsingers with an­other house­hold liv­ing in an­other part of the state, who matched them on key ques­tions. This mech­a­nism pre­served their pri­vacy, and kept sum­maries like the vot­ing age pop­u­la­tion of the is­land cor­rect, but it also in­tro­duced uncer­tainty into the data.

Swap­ping not enough

The bu­reau’s at­tack on it­self showed that swap­ping wasn’t enough. Swap­ping fo­cused on peo­ple who were iso­lated like the Luchsingers or who had char­ac­ter­is­tics that made them stand out in their neigh­bor­hood — the cells in the ta­bles with only a sin­gle per­son.

On Thurs­day, the Cen­sus Bu­reau will re­veal the de­tails of ap­ply­ing dif­fer­en­tial pri­vacy to its 2018 End­to­end Cen­sus Test, how it will con­trol the level of noise in the sum­mary ta­bles to guar­an­tee pri­vacy.

The Cen­sus Bu­reau has been an early adopter of dif­fer­en­tial pri­vacy. Still, in­sti­tut­ing the frame­work on such a large scale is not an easy task, and even some of the big tech­nol­ogy firms have had dif­fi­cul­ties.

For ex­am­ple, shortly after Ap­ple’s an­nounce­ment in 2016 that it would use dif­fer­en­tial pri­vacy for data col­lected from its macos and IOS op­er­at­ing sys­tems, it was re­vealed that the ac­tual pri­vacy loss of their sys­tems was much higher than ad­ver­tised.

2011 File Photo/the New York Times

The Cen­sus Bu­reau “swapped” De­bra and David Luchsinger’s in­for­ma­tion with an­other house­hold liv­ing in an­other part of New York who matched them on some key ques­tions.

Newspapers in English

Newspapers from USA

© PressReader. All rights reserved.