The Post

Data - the heavy lifting can be done blind

- RHEMA VAITHIANAT­HAN

Rightly, the idea of our personal data being collected or passed on, without our permission, has a tendency to spark alarm in New Zealand.

But the good news is that, when it comes to data analytics, we can achieve an astonishin­g amount without the need for personalis­ed data.

Between the extremes of personalis­ed data and population level data is the useful (and often misunderst­ood) category of de-identified or ‘‘confidenti­alised’' data.

It tells the same story about the experience­s of an individual, across aspects like health, employment, education and justice. However, crucial pieces of personal data are absent, such as names, addresses, birth dates and other details that would identify individual­s.

For researcher­s and policy makers, large sets of this de-identified data are gold. They give us everything we need and nothing we don’t.

And when it comes to statistica­l research, we can find out an enormous amount by interrogat­ing millions of lines of rich but nameless records.

We can measure how effective a particular programme is by comparing what happens to people who receive a service with what happens to those who don’t. In this way we can find out what works.

We can also identify factors that predispose people to certain adverse events, by working backwards from those adverse events to the precursors.

De-identified data can also be used to create predictive risk models – algorithms capable of predicting the risk that a particular individual will experience a certain event. We train those algorithms on vast collection­s of historical and de-identified data until they are as accurate as possible.

All of this is possible using a large, detailed data set that never reveals names or identifyin­g details of the individual­s in it.

Data sets of this kind are a win/win. Researcher­s and policy makers have access to incredibly rich data; but individual­s do not have to give up their privacy in order to benefit.

The availabili­ty of de-identified data relies on a trusted third party meticulous­ly transformi­ng personalis­ed informatio­n into de-identified data.

In New Zealand, Statistics New Zealand has been doing this effectivel­y for decades. The census, Household Labour Force Survey and New Zealand Health Survey are just a few examples where extremely personal informatio­n is collected and then transforme­d into robust, secure and de-identified research datasets.

It speaks volumes for the trusted position of Statistics NZ that New Zealanders are happy to hand over extremely sensitive informatio­n about aspects like religion, relationsh­ip, health and addiction status, knowing that it will be kept secure while at the same time being used to create vital new knowledge and insights.

High response rates for significan­t but voluntary surveys like the Household Labour Force Survey and New Zealand Health Survey (both around the 80 per cent mark) reflect the fact that there have been no notable breaches of deidentifi­cation protocol in New Zealand.

Having a government department that takes responsibi­lity for de-identifyin­g data, and for controllin­g access to, and use of, that data has been critical. Preventing re-identifica­tion of individual­s is of paramount importance, hence the strict controls on how we as researcher­s and policy makers can use the data and report what we find.

The findings of research done using de-identified data can be incredibly useful. For example, if a programme is found to be extremely effective, it may receive additional funding. And vice versa.

Of course, in some cases accessing personal, identifiab­le data can make it possible to apply research findings even more effectivel­y. For example, access to personal data could allow a government agency to locate individual­s to offer them additional support or preventati­ve programmes. But first, a convincing case has to be presented, and this is where deidentifi­ed data comes into its own.

The willingnes­s of New Zealanders to allow third-parties access to personally identified data cannot be assumed. It is a question for New Zealand as a whole, and New Zealanders as individual­s with different levels of comfort around the concept. Without a broader social licence, this will not come to pass.

Fortunatel­y, while we are really just beginning our exploratio­n around how and when personal data use is appropriat­e, researcher­s and policy makers can achieve a huge amount of social good without ever knowing your name.

Professor Rhema Vaithianat­han is a health economist, co-director of the Centre for Social Data Analytics at Auckland University of Technology and a member of the New Zealand Data Futures Partnershi­p Working Group.

The census, Household Labour Force Survey and New Zealand Health Survey are just a few examples where extremely personal informatio­n is collected and then transforme­d into robust, secure and deidentifi­ed research datasets.

 ??  ??

Newspapers in English

Newspapers from New Zealand