Figuring out Covid-19 data
No statistical model is perfect and ‘uncertainty is the only certainity there is’, say experts
A STATISTICAL model cited by the White House generated a slightly less grim figure this week for a first wave of deaths from the coronavirus pandemic in the US – a projection designed to help officials plan for the worst, including having enough hospital staff, beds and ventilators.
The only problem with this bit of relatively good news? It’s almost certainly wrong. All models are wrong. Some are just less wrong than others. Welcome to the grimace-and-bear-it world of modelling.
“The key thing is that you want to know what’s happening in the future,” said Nasa top climate modeller Gavin Schmidt. “In the absence of a time machine, you’re going to have to use a model.”
Weather forecasters use models. Climate scientists use them. Supermarkets use them.
As leaders try to get a handle on the coronavirus outbreak, they are turning to numerous mathematical models to help them figure out what might happen next and what they should try to do now to contain the spread.
The model updated this week by the University of Washington – the one most often mentioned by US health officials at White House briefings – predicts that daily deaths in the US will hit a peak in mid-April and then decline.
Their latest projection shows that anywhere from 49431 to 136401 people in the US will die in the first wave, which will last into the summer.
That’s a huge range of 87 000. But only a few days earlier the same team had a range of nearly 138 000, with 177 866 as the top number of deaths.
The latest calculations are based on better data on how the virus acts, more information on how people act and using more cities as examples. For example, new data from Italy and Spain suggest physical distancing is working even better than expected to stop the spread of the virus.
So how does modelling work? Take everything we know about how the coronavirus is spreading, when it’s deadly and when it’s not, when symptoms show and when they don’t.
Then factor in everything we know about how people are reacting, distancing themselves, stay-at-home orders and lockdowns, and other squishy human factors. Now add everything we know about testing, treating the disease and equipment shortages.
Finally, mix in large dollops of uncertainty at every level.
Squeeze all those thousands of data points into incredibly complex mathematical equations and voila, here’s what’s going to happen next with the pandemic. Except, remember, there’s a huge margin of error.
“No model is perfect, but most models are somewhat useful,” said John Allen Paulos, a professor of maths at Temple University and author of several books about maths and everyday life. “But we can’t confuse the model with reality.”
One challenge for modellers is dealing with see-sawing death totals from overburdened public health departments.
Data might show big swings in deaths – but only because a backlog of reports showed up all at once.
Another problem, said University of Texas disease modeller Lauren Meyer, is that most of the pandemic models, including hers, are based on how influenza acts, and that is different from this new coronavirus.
Even with all of the uncertainty, “it’s much better than shooting from the hip,” said Meyer.
“Data-driven models are the best evidence we have.”
Because of the large fudge factor, it’s smart not to look at one single number – the minimum or maximum number of deaths – but instead at the range of confidence, where there’s a 95% chance reality will fall, mathematician Paulos said.
“Uncertainty is the only certainty there is,” Paulos said. “And knowing how to live with insecurity is the only security.”