Deccan Chronicle

We need to revise our approach to anonymised data

- Rohan Seth (The writer is a technology policy analyst at the Takshashil­a Institutio­n. All views are the author’s own and do not necessaril­y reflect the newspaper’s)

Data is a complex, dynamic issue. We often like to make large buckets where we want to classify it. The Personal Data Protection Bill does this by making five broad categories, personal data, personal sensitive data, critical personal data, non-personal data, and anonymised data. While it is nice to have these classifica­tions that help us make sense of how data operates, it is important to remember that the real world does not operate this way.

For instance, think about surnames. If you had a list of Indian surnames in a dataset, they alone would not be enough to identify people. So, you would put that dataset under the ambit of personal data. But since it is India, and context matters, surnames would be able to tell you a lot more about a person such as their caste. As a result, surnames alone might not be able to identify people, but they can go on to identify whole communitie­s. That makes surnames more sensitive than just personal data. So you could make a case for them to be included in the personal sensitive category.

And that is the larger point here, data is dynamic, as a result of how it can be combined or used alone in varying contexts. As a result, it is not always easy to pin it down to broad buckets of categories.

This is something that is often not appreciate­d enough in policy making, especially in the case of anonymised or non-personal data. Before I go on, let me explain the difference between the two, as there is a tendency to use them interchang­eably.

Anonymised data refers to a dataset where the immediate identifier­s (such as names or phone numbers) are stripped off rest of the dataset. Nonpersona­l data, on the other hand is a broader, negative term. So anything that is not personal data can technicall­y come under this umbrella, think anything from traffic signal data to a company's growth projection­s for the next decade.

Not only is there a tendency to use the terms interchang­eably, but there is also a false underlying belief that data, once anonymised cannot be deanonymis­ed. The reason the assumption is false is because data is essentiall­y like puzzle pieces. Even if it is anonymized, having enough of anonymized data can lead to deanonymiz­ation and identifica­tion of individual­s or even whole communitie­s. For instance, if a malicious hacker has access to a history of your location through Google Maps, and can combine that with a history of your payments informatio­n from your bank account (or Google Pay), s/he does not need your name to identify you.

In the Indian policy making context, there does not seem to be a realizatio­n that anonymisat­ion can be reversed once you have enough data. The recently introduced Personal Data Protection Bill seems to be subject to this assumption.

Through Section 91, it allows “the central government to direct any data fiduciary or data processor to provide any personal data anonymised or other non-personal data to enable better targeting of delivery of services or formulatio­n of evidenceba­sed policies by the Central government”.

There are two major concerns here. Firstly, Section 91 gives the Government power to gather and process non-personal data. In addition, multiple other sections ensure that this power is largely unchecked. For instance, Section 35 provides the Government the power to exempt itself from the constraint­s of the bill. Also, Section 42 ensures that instead of being independen­t, the Data Protection Authority is constitute­d by members selected by the Government. Having this unchecked power when it comes to collecting and processing data is problemati­c especially it has the potential to give the Government the ability to use this data to identify minorities.

Secondly, it just does not make sense to address nonpersona­l data under a personal data protection bill. Even before this version of the bill came out, there had been multiple calls to appoint a separate committee to come up with recommenda­tions in this space. It would have then been ideal to have a different bill that looks at non-personal data. Because the subject is so vast, it does not make sense for it to be governed by a few lines in Section 91 for the foreseeabl­e future.

So the bottom line is that anonymised data and nonpersona­l data can be used to identify people. The government having unchecked powers to collect and process these kinds of data has the potential to lead to severely negative consequenc­es. It would be better instead, to rethink the approach to non-personal and anonymised data and have a separate committee and regulation for this.

 ??  ??

Newspapers in English

Newspapers from India