Researchers hindered in virus fight by data roadblocks
Eager to help, scientists urge provinces to ‘open the gates’ to crucial health information
Canada’s world-leading community of artificial intelligence scientists is mobilizing to help answer critical COVID-19 research questions, hoping to create predictive tools to flag which patients might become sickest, identify emerging outbreak hot spots, and hunt for molecules to form the basis of coronavirus-tackling drugs.
But AI researchers — and Canadian scientists of all stripes — say they are impeded by stingy provincial datasharing and a culture of trapping information in “secret jails” and silos, which robs machine-learning algorithms in particular of the fuel they need to function.
If data-sharing agreements already existed, “a lot of the projects that people are discussing right now, they could do — the data would already be flowing. But it’s not,” said Yoshua Bengio, a computer scientist at the Université de Montréal, scientific director of the Quebec Artificial Intelligence Institute MILA, and a co-recipient of the Turing Prize, the so-called Nobel Prize of computer science.
“I’m hoping this will force the hand of provincial governments to actually open the gates in a reasonable way, and understand that over the long term they have to change the structure of the system.”
The Canadian Institute for Advanced Research (CIFAR) held an international virtual roundtable on Monday to identify how artificial intelligence might answer some of the most pressing questions related to the COVID-19 pandemic, and to connect top machine learning researchers with experts in disease modelling and epidemiology, drug development, health system capacity and more.
“The power of AI is that it takes vast amounts of data and discerns patterns in them to make sense of that data,” says CIFAR president Alan Bernstein. The federal government asked CIFAR to convene Canada’s AI scientists to help combat the pandemic, Bernstein says. CIFAR leads the $125-million Pan-Canadian AI Strategy, a program designed to leverage early Canadian AI breakthroughs and expand the country’s machine learning research ecosystem. The organization has $100,000 for a handful of COVID-19 “catalyst” grants to kick-start collaborations, but hopes that figure expands through additional public and private funding.
One role for AI could be predicting which patients who test positive for COVID-19 are most likely to suffer severe outcomes and require life-saving medical interventions.
Data from China and Italy has demonstrated that older patients, especially those with underlying health conditions, are more likely to need intensive care or to die. But some young people with the infection suddenly crash, suffering severe respiratory distress, too, and scientists don’t fully understand why.
“The majority of the population is going to have this infection … the issue is we don’t know who is going to respond really negatively to it,” says Marzyeh Ghassemi, a professor of computer science and medicine at the University of Toronto and a Canadian CIFAR AI chair at the Vector Institute.
“Can I predict which people are most susceptible to having an aggressive or intensive COVID-19 infection?”
In 2017, Ghassemi published research that used artificial intelligence to predict which patients in an intensive care unit would need invasive ventilation and other life-saving measures. The availability of ventilators, which help patients struggling to breathe, is expected to be a vital resource: Italian critical care patients overwhelmed hospital ventilator availability, and modelling suggests that could happen in Ontario, too.
Using a dataset of approximately 34,000 ICU visits at a U.S. hospital, the deep learning algorithm Ghassemi and her colleagues created was able to predict with a high degree of accuracy which patients would need ventilation and other measures, in some cases up to six hours in advance.
With a large enough dataset that included detailed patient histories as well as COVID-19 outcomes, a similar algorithm could quickly pick up patterns in which young and seemingly healthy patients end up suddenly needing a ventilator or other critical care — work that would be vastly slower or impossible for a human to perform, and which could in turn help people understand their own risk and help doctors plan.
But since being recruited to Toronto from the Massachusetts Institute of Technology, Ghassemi says she has struggled to access data relevant to health care in Ontario or Canada.
“I am a publicly funded researcher at a publicly funded university and my students can only do research on data from the United Kingdom and the United States. I’ve been here for almost two years and I don’t have active projects for my PhD students that uses Canadian health data, because I can’t get access to it,” Ghassemi says, adding that means her work is helping other countries solve important health challenges.
“In both of those jurisdictions, data is not kept in secret jails in the way that it is in Ontario.”
Some in the field say they see signs of progress.
“I would say that there is a strong willingness to move quickly to try to bring the necessary data together to try to tackle this. I think governments, the health system, researchers are all mobilizing to try to answer that call quickly and responsibly,” says Andrea Smith, director of Health Data Partnerships at the Vector Institute.
Privacy concerns are often cited as a risk for distributing even anonymized health data, but researchers who want to see this information shared in a responsible way say slowing the pace of health-care research should be balanced against that.
“Inaction is actually a huge risk in and of itself,” says Smith.
A spokesperson for Ontario Health Minister Christine Elliott said the provincial government is planning changes to the legislation that governs the protection of health information that would allow for better coordination of data between primary care physicians and public health, and that will enable new regulations to provide deidentified data to researchers.
“It’s a huge limitation right now, one we identified before COVID when the legislation was drafted. But it’s been made all the more clear now,” said Travis Kann.
On Monday evening, the Canadian Institute for Health Information, an independent organization that is mostly funded by the federal government and provincial and territorial ministries of health, created a new webpage pointing to several types of information relevant to the COVID-19 pandemic, including information hospital beds, ICUs, ventilators and the number and distribution of health-care workers across Canada.
The CIHI portal collects several sources of information that were previously scattered and challenging to find, and a spokesperson said it will be expanded. But in a fast-moving pandemic, researchers say they need information that is either real-time or at least updated daily.
AI scientists aren’t the only ones frustrated by a shortage of relevant data.
Though there are questions he would like to explore with machine learning experts, “there’s aton we can do with just regular old fashioned intel,” says David Fisman, an epidemiologist at the University of Toronto’s Dalla Lana School of Public Health.
Fisman and his colleagues have been modelling how social distancing interventions will affect Ontario’s epidemic. But because the case count information the province publicly shares is fragmented and partial, the group has been struggling to calculate whether the measures Ontario has implemented in recent weeks are helping to flatten the epidemic curve and bring the outbreak under control.
Researchers point out that Ontario has a single-payer health-care system, which should make data more accessible, not less, and that science derived from places like Toronto could benefit the whole world, since the population is so diverse.
One of the reasons Ghassemi came here, she says, is that “the ecosystem is so promising. And yet there’s this cultural conservatism in senior leadership in the health-care space.”