Too much noise in the census results, or not enough?
The incredible traveler Schillig: A surefire way to move your column from its new Friday slot to its old Saturday slot after only a week is to write a column about how much you like the Friday slot. So I’ll see you again on Saturday. Definitively. Perhaps.
Years ago, a speaker at a seminar I attended advised teachers to intentionally make mistakes on tests for students to find. The premise was that these errors would make educators more human.
It was a bad idea then, and it’s a bad idea now. Mistakes happen quite often without making them on purpose.
I wonder if similar concerns have crossed the minds of US Census Bureau statisticians about so-called differential privacy. This controversial practice introduces “noise,” or errors, into census data to make it harder for third parties to reconstruct the identity of census respondents.
Census records are required by law to remain confidential for 72 years, so it was last month that the National Archives opened the vault, so to speak, and released names and addresses from 1950.
However, many experts say 2020 census data, already marred by controversy over President Trump’s insistence on excluding unauthorized immigrants (he lost that bid), could be merged with publicly available information to reveal the identities of individual respondents almost immediately if steps are not taken to obscure them.
A recent New York Times report mentions a census block in Chicago that has 14 people living underwater. Not really, of course, but the census computers assigned them there to make it harder for bad actors to connect the information to specific people. And this particular census block is just one of tens of thousands of such places that err in the name of privacy.
It’s the proverbial Rock, meets Hard Place situation. On the one hand, people who voluntarily provide potentially sensitive information in a Census Bureau survey do so because their identity will remain anonymous. Without such a guarantee, they are less likely to share information, which makes census results less comprehensive.
On the other hand, intentional errors in census data — even for the best of reasons — undermine public confidence in that data, which is used to determine budgets, government aid, and legislative constituencies.
Some readers probably feel the same nonchalance toward census confidentiality as they feel toward various forms of in-person or online surveillance. In other words, if you’re doing nothing wrong, what are you worrying about? Out of 330 million people in the country, why would anyone bother to piece together information about you and the people you know?
This is an easy attitude to adopt if you are a WASP and most of your personal data could not be used against you.
But if you’re a minority, immigrant, or LGBTQ+, answering census questions increases the chances that an unscrupulous third party can merge your data with other readily available information (voter registration, for example) and create a list to share with the world. Why take the risk?
Yet scrambling the data creates its own set of challenges. The whole “people living underwater” misstep was an unintentional mistake made while trying to make an intentional mistake, a computer that randomly assigned respondents to a census block that no longer has at least one person who actually lives there.
And that doesn’t humanize the US Census Bureau, as that speaker long ago suggested that would happen when students caught teachers making a mistake. No, Americans already recognize that the office is all too human, trying to do an impossible job made even more difficult by the advent of readily available computing power in the hands of those who would abuse it.
I don’t know the solution. Results should not be rendered inaccurate in the name of confidentiality, and confidentiality should not be violated in the name of accuracy.
At least the Census Bureau will have more time to think about it, because most of the 2020 data has been rolled over to next year, partly because of COVID and partly because of this fuzzy data issue.
Is there a middle ground – private enough? precise enough?
A “enough” is certain: whatever the solution, it will never be enough to satisfy everyone.
Contact Chris at [email protected] On Twitter: @cschillig.