Census privacy vs. accuracy?
In the long history of the United States census — the first was conducted in 1790 — few have faced headwinds as strong as the current count. First there was congressional underfunding that slowed the Census Bureau’s ability to do community outreach and other ground-prepping activities. Then came the Trump administration’s efforts to politicize the census by adding a citizenship question intended to suppress participation in Democratic-leaning immigrant communities.
And then the COVID-19 pandemic delayed in-person data collection and forced the Census Bureau to push back by several months the deadlines for reporting each state’s allotment of seats in the House of Representatives and for sending states their redistricting data.
Now there’s a fresh breeze stirring: Some critics argue that the Census Bureau’s decision to use differential privacy, a statistical technique, to help protect respondents’ privacy may render some of the census data unreliable, potentially exaggerating the size of rural communities (and thus increasing their representation in Congress and state legislatures) and undercounting nonwhites. Those problems have particular significance in California and other states with large numbers of nonwhite residents, and for states with large rural populations.
The scope of the problem can be measured by the breadth of its critics. California legislative leaders have written to the White House questioning the effects of differential privacy. The state of Alabama filed a lawsuit last month over similar questions. And immigrant rights groups representing Asian and Latino Americans issued a report Monday warning that “if there is a systemic bias in the resulting data, the legal requisites of redistricting would not be met” and could violate the federal Voting Rights Act.
How to fix the problem is unclear, although census officials are still working at it. If the issues can’t be resolved, some experts believe the bureau could still produce final numbers using 2010 methodology, but it is running out of time to do so.
The Census Bureau faces two conflicting responsibilities: to protect respondents’ privacy, and to ensure an accurate count, which is vital to both the reallocation of House seats and the distribution of roughly $1.5 trillion in annual federal spending.
To improve the census’ accuracy, the government has in the past used statistical tricks to account for people census enumerators know exist but can’t reach, such as assigning to a nonresponsive address average household data for that particular census block. But after the 2010 census, officials came to realize that the advent of commercial data brokers and stronger computers meant privacy could be compromised by sorting through commercially available data (often used by businesses to target consumers) and matching it against census data to extract individual characteristics, a process called re-identification. That led the bureau to incorporate differential privacy techniques that inject false data — “noise,” in the parlance — into the census, making re-identification more difficult.
The false data don’t affect statistics in large sets of data — the total population count, for example — but do have an amplified effect on small data sets of the sort that, for instance, are used in redistricting legislative seats.
So there’s the tension — privacy protection versus census data accuracy. The Census Bureau is expected later this month to release a new trial run using differential privacy that you can be sure will be combed over by advocates and others concerned with achieving both an accurate census and protecting the privacy of the respondents. This all may ultimately lead to more court challenges than the one filed by Alabama, leaving the outlook for the final census figures uncertain.
It is in the nation’s best interest that the government and census watchdogs find that sweet spot between completing an accurate census (in time for the redistricting of congressional and legislative districts) while maintaining acceptable levels of privacy protection.
But if the government can’t reach that balance, then it should abandon differential privacy or skew its methodology to emphasize accuracy. As important as protecting privacy is, it doesn’t warrant imperiling the reliability of vital data. Number crunchers might be able to glean a few personal tidbits, but much of that information is already available through commercial data harvesting. Emphasizing privacy over accuracy in this case is the wrong move.