HOW STATCAN MISCOUNTED ANGLOS.
It was about a week after Statistics Canada had released its 2016 census language data that it became apparent something had gone badly wrong with the numbers from Quebec.
The information put out on Aug. 2 had shown startling increases of anglophones in heavily French-speaking areas of Quebec, prompting the Parti Québécois and others to speak out on the need for stronger French language protection.
But Jack Jedwab, a demography specialist and executive vice-president of the Association for Canadian Studies, came to a different conclusion: the reported numbers seemed practically impossible, especially when compared to immigration and English school enrolment rates. He aired his views to StatCan and to numerous media outlets.
The stakes were high, as census information is relied upon for countless academic studies and government policy decisions. StatCan’s reputation is entirely dependent on producing high-quality data.
Anil Arora, the chief statistician, was made aware of the concerns on Wednesday, Aug. 9, and an investigation was launched. By Friday morning, staff had isolated the problem, narrowing it down to a processing error on 61,000 followup questionnaires. (It was discovered that the order of responses was reversed on the French and English versions, but the computer read them as if they were in the same order.)
The data was pulled down from the StatCan website that day, and a bulletin was issued informing users that an error had been found.
“And then from that point on, we worked pretty much 24/7, all weekend and early this week to make the corrections,” said Marc Hamel, the director of the census program.
After getting a precautionary review from an outside panel of experts, the new, corrected information was released on Thursday morning — and the mysterious outliers were gone. Instead of 9.6 per cent of people in Quebec now having English as their mother tongue, up from 9 per cent in 2011, it turned out the proportion had actually declined slightly to 8.9 per cent.
Hamel said the breakdown happened in two places. “The computer error should have been detected at the source,” he said, outlining how the automated processes are tested ahead of time to make sure they’re functioning as intended. “And in the second step, before we did release the information, we should again have probably caught that. If you looked at the pattern of English growing in Quebec, some of the pattern didn’t seem normal, and it should have been caught there. So this was an error.”
With nearly 15 million census responses and a huge team of people working on them — Hamel said it was about 1,200 workers collecting and preparing the data last summer, and then around 300 doing the indepth processing and analysis — it’s an enormous task to keep the results error-free.
But even so, any time a mistake creeps into a report it’s devastating for the agency, said former chief statistician Wayne Smith — who speaks from experience, having been in charge in 2014 when a computer programming error caused the agency to retract a monthly labour data report.
“Inside the organization, this is an extraordinarily painful event,” he said. “Everybody knows the credibility of the agency depends on this kind of thing not happening.”
(Smith resigned in the fall of 2016 over complaints that StatCan’s independence was under threat due in part to its aging IT infrastructure, which has been taken over by the Shared Services department. But he said that’s a separate issue from what seems to have happened here.)
Hamel, who’s been in charge of the census program since 2009, said they’ve painstakingly checked over the previously-released census data, and are reviewing practices to ensure the future releases are clean.