Northwest Arkansas Democrat-Gazette

Census blurs data to keep IDs private

- JENNIFER MCDERMOTT AND MIKE SCHNEIDER

PROVIDENCE, R.I. — In an age of rapidly advancing computer power, the U.S. Census Bureau recently undertook an experiment to see whether census answers could threaten the privacy of the people who fill out the questionna­ires.

The agency went back to the last national head count, in 2010, and reconstruc­ted individual profiles from thousands of publicly available tables. It then matched those records against other public population data.

The result: Officials were able to infer the identities of 52 million Americans.

Confronted with that discovery, the bureau announced that it would add statistica­l “noise” to the 2020 data, essentiall­y tinkering with its own numbers to preserve privacy. But that idea creates its own problems, and social scientists, redistrict­ing experts and others worry that it will make next year’s census less accurate. They say the bureau’s response is overkill.

“This is a brand new, radically more conservati­ve definition of privacy,” University of Minnesota demographe­r Steven Ruggles said.

Federal law bars census officials from disclosing any individual’s responses. But data-crunching computers can tease out likely identities from the broader census results when combined with other personal informatio­n.

Some critics fear the agency’s changes could make it harder to draw new congressio­nal and legislativ­e districts accurately. Others worry that research on immigratio­n, demographi­cs, the opioid epidemic and declining life expectancy will be hindered, particular­ly when it involves less populated areas.

If the change had been in place four years ago, Ruggles said, he would not have been able to conduct a 2015 study on the impact of declines in young men’s incomes on marriage.

With more data sets available to the public with a quick download, it has become easier than ever to match informatio­n with real names. That means aggregated answers to census questions involving race, housing and relationsh­ips could lead to individual­s.

The fear is that advertiser­s, market researcher­s or anybody with know-how and curiosity could use data to reconstruc­t the identities of census respondent­s.

When the bureau went back to the 2010 census, it matched the census data with commercial databases. More than 1 in 6 respondent­s were identified by name and neighborho­od as well as by informatio­n about their race, ethnicity, sex and age.

Since the last census, “the data world has changed dramatical­ly,” Ron Jarmin, deputy director of the census agency, wrote earlier this year. “Much more personal informatio­n is available online and from commercial providers, and the technology to manipulate that data is more powerful than ever.”

The Trump administra­tion’s unsuccessf­ul effort to add a citizenshi­p question to the 2020 questionna­ire heightened fears about how census informatio­n would be used. But privacy concerns are nothing new for the bureau.

Historians have found evidence that census data helped identify Japanese Americans who were rounded up and confined to camps during World War II. That revelation led to an apology from then-Census Bureau Director Kenneth Prewitt in 2000.

Jewish groups and some liberal organizati­ons had concerns about privacy when the bureau was lobbied to ask about religion for the 1960 census. Some noted that Nazis had used government and church records to identify and round up Jews. The idea never went anywhere.

During the legal battle over the citizenshi­p question, advocates worried that the informatio­n could be used to target residents in the country illegally. Some say lingering concerns could have a chilling effect on the 2020 census.

To address those worries, the bureau has adopted a technique called “differenti­al privacy,” which alters the numbers but does not change core findings to protect the identities of individual respondent­s.

It’s analogous to pixilating the data, a technique commonly used to blur certain images on television, said Michael Hawes, senior adviser for data access and privacy at the Census Bureau.

Redistrict­ing experts say the mathematic­al blurring could cause problems because they rely on precise numbers to draw congressio­nal and state and local legislativ­e districts. They also worry that it could dilute minority voting power and violate the Voting Rights Act.

“The numbers might be off by five, 10, 20 people, and if you’re dealing with exact percentage­s, that could mean something. That could mean a lot,” said Jeffrey M. Wice, a national redistrict­ing attorney. “That’s why we care about it so much.”

In the past, the bureau has used “swapping” and other methods to protect confidenti­ality. Swapping involves taking similar households in different geographic areas and exchanging demographi­c characteri­stics.

Census data does not need to be exact for most purposes, “as long as we know it’s really pretty close,” said Justin Levitt, an election law professor at Loyola Law School in Los Angeles. But “there’s certainly a point where blurry becomes too blurry.”

The bureau has not decided precisely how much blurring will take place, but researcher­s have already delivered academic papers and organized a petition signed by more than 4,000 scholars, planners and journalist­s. The petition asked the bureau to include the research community in its discussion­s.

Michael McDonald, a University of Florida redistrict­ing expert, said people must be assured their data will be kept confidenti­al or they may not respond at all. If respondent­s do not answer questions for the once-a-decade census in a timely manner, census workers must try to interview them in person.

“We need high response rates to the census,” McDonald said. “If we don’t get them, whatever noise will be moot because we won’t have good data to start with.”

 ?? AP/JIM MONE ?? “This is a brand new, radically more conservati­ve definition of privacy,” University of Minnesota demographe­r Steven Ruggles said of the U.S. Census Bureau plan to add statistica­l “noise” to next year’s data to preserve privacy. Ruggles and others worry that the move will render the results less accurate.
AP/JIM MONE “This is a brand new, radically more conservati­ve definition of privacy,” University of Minnesota demographe­r Steven Ruggles said of the U.S. Census Bureau plan to add statistica­l “noise” to next year’s data to preserve privacy. Ruggles and others worry that the move will render the results less accurate.

Newspapers in English

Newspapers from United States