Santa Fe New Mexican

A world of problems decoded at LANL

Scientists study big data to identify patterns, aiming to predict trends that could threaten global security

- By Sara Del Valle Sara Del Valle, Ph.D., is a computatio­nal epidemiolo­gist who leads the Data Fusion team at Los Alamos National Laboratory. A version of this article first appeared in Scientific American in March.

The ability to collect informatio­n far outpaces the ability to fully utilize it — yet that informatio­n may hold the key to solving some of the biggest global challenges.

Take, for instance, the frequent outbreaks of waterborne illnesses as a consequenc­e of war or natural disasters. The most recent example comes from Yemen, where, according to the World Health Organizati­on, nearly 536,000 new suspected cases of cholera were reported, with 773 associated deaths, between January and the end of July alone.

History is riddled with similar stories. What if it we could better understand the environmen­tal factors that contribute­d to the disease, predict which communitie­s are at higher risk, and take action to stem the spread?

Answers to these questions — and others like them — could help avert potential catastroph­e.

Data is already collected about virtually everything, from birth and death rates to crop yields and traffic flows. IBM estimates that each day,

2.5 quintillio­n bytes of data are generated — equivalent to producing all the informatio­n in the Library of Congress more than 166,000 times every 24 hours.

Yet the power of all this informatio­n is not fully harnessed. It’s time to change that — and thanks to recent advances in data analytics and computatio­nal services, we finally have the tools to do it.

Data scientists at Los Alamos National Laboratory study data from wide-ranging, public sources to identify patterns, aiming to predict trends that could threaten global security. Multiple data streams are critical because the ground-truth data (such as surveys) are often delayed, biased, sparse, incorrect or sometimes nonexisten­t.

For example, knowing mosquito incidence in communitie­s would help public health officials predict the risk of mosquito-transmitte­d disease such as dengue, the leading cause of illness and death in the tropics, or West Nile virus, which has been found in New Mexico each year since 2003. However, mosquito data at a global (and even national) scale is not available.

To address this gap, Los Alamos is using other sources such as satellite imagery, climate data and demographi­c informatio­n to estimate risk. Using these data streams, as well as clinical surveillan­ce data and Google search queries that used terms related to the disease, Los Alamos has developed a model that successful­ly predicts the spread of dengue in Brazil at the regional, state and municipali­ty level.

While the prediction­s aren’t perfect, they show promise. The researcher­s’ goal is to combine informatio­n from each data stream to further refine the models and improve their predictive power.

Similarly, to forecast the flu season, scientists at Los Alamos have found that Wikipedia and Google searches can complement clinical data. Because the rate of people searching the internet for flu symptoms often increases during their onset, the models can predict a spike in cases where data from health clinics lags.

These same concepts are being used to expand research beyond disease prediction to better understand public sentiment. In partnershi­p with the University of California, Los Alamos is conducting a three-year study using disparate data streams to understand whether opinions expressed on social media map to opinions expressed in surveys.

For example, in Colombia, Los Alamos is studying whether social media posts about the peace process between the government and FARC, the socialist guerilla movement, can be ground-truthed with survey data. A UC Berkeley researcher is conducting on-the-ground surveys throughout Colombia — including in isolated rural areas — to poll citizens about the peace process. Meanwhile, at Los Alamos, researcher­s are analyzing social media data and news sources from the same areas to determine if they align with the survey data.

If it’s possible to demonstrat­e that social media accurately captures a population’s sentiment, it could be a more affordable, accessible and timely alternativ­e to what are otherwise expensive and logistical­ly challengin­g surveys.

In the case of disease forecastin­g, if social media posts indeed predict outbreaks, that data could be used in educationa­l campaigns to inform citizens of the risk of an outbreak (due to vaccine exemptions, for example) and ultimately reduce that risk by promoting protective behaviors (such as washing hands, wearing masks, remaining indoors, etc.).

All of this illustrate­s the potential for big data to solve big problems. Los Alamos and other national laboratori­es that are home to some of the world’s largest supercompu­ters have the computatio­nal power augmented by machine learning and data analysis to take this informatio­n and shape it into a story for not only one state or even nation, but the world as a whole. The informatio­n is there. It’s time to use it.

 ??  ?? Science on the Hill
Science on the Hill

Newspapers in English

Newspapers from United States