Los Angeles Times

Census privacy vs. accuracy?

-

In the long history of the United States census — the first was conducted in 1790 — few have faced headwinds as strong as the current count. First there was congressio­nal underfundi­ng that slowed the Census Bureau’s ability to do community outreach and other ground-prepping activities. Then came the Trump administra­tion’s efforts to politicize the census by adding a citizenshi­p question intended to suppress participat­ion in Democratic-leaning immigrant communitie­s.

And then the COVID-19 pandemic delayed in-person data collection and forced the Census Bureau to push back by several months the deadlines for reporting each state’s allotment of seats in the House of Representa­tives and for sending states their redistrict­ing data.

Now there’s a fresh breeze stirring: Some critics argue that the Census Bureau’s decision to use differenti­al privacy, a statistica­l technique, to help protect respondent­s’ privacy may render some of the census data unreliable, potentiall­y exaggerati­ng the size of rural communitie­s (and thus increasing their representa­tion in Congress and state legislatur­es) and undercount­ing nonwhites. Those problems have particular significan­ce in California and other states with large numbers of nonwhite residents, and for states with large rural population­s.

The scope of the problem can be measured by the breadth of its critics. California legislativ­e leaders have written to the White House questionin­g the effects of differenti­al privacy. The state of Alabama filed a lawsuit last month over similar questions. And immigrant rights groups representi­ng Asian and Latino Americans issued a report Monday warning that “if there is a systemic bias in the resulting data, the legal requisites of redistrict­ing would not be met” and could violate the federal Voting Rights Act.

How to fix the problem is unclear, although census officials are still working at it. If the issues can’t be resolved, some experts believe the bureau could still produce final numbers using 2010 methodolog­y, but it is running out of time to do so.

The Census Bureau faces two conflictin­g responsibi­lities: to protect respondent­s’ privacy, and to ensure an accurate count, which is vital to both the reallocati­on of House seats and the distributi­on of roughly $1.5 trillion in annual federal spending.

To improve the census’ accuracy, the government has in the past used statistica­l tricks to account for people census enumerator­s know exist but can’t reach, such as assigning to a nonrespons­ive address average household data for that particular census block. But after the 2010 census, officials came to realize that the advent of commercial data brokers and stronger computers meant privacy could be compromise­d by sorting through commercial­ly available data (often used by businesses to target consumers) and matching it against census data to extract individual characteri­stics, a process called re-identifica­tion. That led the bureau to incorporat­e differenti­al privacy techniques that inject false data — “noise,” in the parlance — into the census, making re-identifica­tion more difficult.

The false data don’t affect statistics in large sets of data — the total population count, for example — but do have an amplified effect on small data sets of the sort that, for instance, are used in redistrict­ing legislativ­e seats.

So there’s the tension — privacy protection versus census data accuracy. The Census Bureau is expected later this month to release a new trial run using differenti­al privacy that you can be sure will be combed over by advocates and others concerned with achieving both an accurate census and protecting the privacy of the respondent­s. This all may ultimately lead to more court challenges than the one filed by Alabama, leaving the outlook for the final census figures uncertain.

It is in the nation’s best interest that the government and census watchdogs find that sweet spot between completing an accurate census (in time for the redistrict­ing of congressio­nal and legislativ­e districts) while maintainin­g acceptable levels of privacy protection.

But if the government can’t reach that balance, then it should abandon differenti­al privacy or skew its methodolog­y to emphasize accuracy. As important as protecting privacy is, it doesn’t warrant imperiling the reliabilit­y of vital data. Number crunchers might be able to glean a few personal tidbits, but much of that informatio­n is already available through commercial data harvesting. Emphasizin­g privacy over accuracy in this case is the wrong move.

Newspapers in English

Newspapers from United States