We need to re­vise our ap­proach to anonymised data

Deccan Chronicle - - Technomics - Ro­han Seth (The writer is a tech­nol­ogy pol­icy an­a­lyst at the Tak­shashila In­sti­tu­tion. All views are the au­thor’s own and do not nec­es­sar­ily re­flect the news­pa­per’s)

Data is a com­plex, dy­namic is­sue. We of­ten like to make large buck­ets where we want to clas­sify it. The Per­sonal Data Pro­tec­tion Bill does this by mak­ing five broad cat­e­gories, per­sonal data, per­sonal sen­si­tive data, crit­i­cal per­sonal data, non-per­sonal data, and anonymised data. While it is nice to have these clas­si­fi­ca­tions that help us make sense of how data op­er­ates, it is im­por­tant to re­mem­ber that the real world does not op­er­ate this way.

For in­stance, think about sur­names. If you had a list of In­dian sur­names in a dataset, they alone would not be enough to iden­tify peo­ple. So, you would put that dataset un­der the am­bit of per­sonal data. But since it is In­dia, and con­text mat­ters, sur­names would be able to tell you a lot more about a per­son such as their caste. As a re­sult, sur­names alone might not be able to iden­tify peo­ple, but they can go on to iden­tify whole com­mu­ni­ties. That makes sur­names more sen­si­tive than just per­sonal data. So you could make a case for them to be in­cluded in the per­sonal sen­si­tive cat­e­gory.

And that is the larger point here, data is dy­namic, as a re­sult of how it can be com­bined or used alone in vary­ing con­texts. As a re­sult, it is not al­ways easy to pin it down to broad buck­ets of cat­e­gories.

This is some­thing that is of­ten not ap­pre­ci­ated enough in pol­icy mak­ing, es­pe­cially in the case of anonymised or non-per­sonal data. Be­fore I go on, let me ex­plain the dif­fer­ence be­tween the two, as there is a ten­dency to use them in­ter­change­ably.

Anonymised data refers to a dataset where the im­me­di­ate iden­ti­fiers (such as names or phone num­bers) are stripped off rest of the dataset. Non­per­sonal data, on the other hand is a broader, neg­a­tive term. So any­thing that is not per­sonal data can tech­ni­cally come un­der this um­brella, think any­thing from traf­fic sig­nal data to a com­pany's growth pro­jec­tions for the next decade.

Not only is there a ten­dency to use the terms in­ter­change­ably, but there is also a false un­der­ly­ing be­lief that data, once anonymised can­not be deanonymis­ed. The rea­son the as­sump­tion is false is be­cause data is es­sen­tially like puzzle pieces. Even if it is anonymized, hav­ing enough of anonymized data can lead to deanonymiz­a­tion and iden­ti­fi­ca­tion of in­di­vid­u­als or even whole com­mu­ni­ties. For in­stance, if a ma­li­cious hacker has ac­cess to a his­tory of your lo­ca­tion through Google Maps, and can com­bine that with a his­tory of your pay­ments in­for­ma­tion from your bank ac­count (or Google Pay), s/he does not need your name to iden­tify you.

In the In­dian pol­icy mak­ing con­text, there does not seem to be a re­al­iza­tion that anonymi­sa­tion can be re­versed once you have enough data. The re­cently in­tro­duced Per­sonal Data Pro­tec­tion Bill seems to be sub­ject to this as­sump­tion.

Through Sec­tion 91, it al­lows “the cen­tral gov­ern­ment to di­rect any data fidu­ciary or data pro­ces­sor to pro­vide any per­sonal data anonymised or other non-per­sonal data to en­able bet­ter tar­get­ing of de­liv­ery of ser­vices or for­mu­la­tion of ev­i­dence­based poli­cies by the Cen­tral gov­ern­ment”.

There are two ma­jor con­cerns here. Firstly, Sec­tion 91 gives the Gov­ern­ment power to gather and process non-per­sonal data. In ad­di­tion, mul­ti­ple other sec­tions en­sure that this power is largely unchecked. For in­stance, Sec­tion 35 pro­vides the Gov­ern­ment the power to ex­empt it­self from the con­straints of the bill. Also, Sec­tion 42 en­sures that in­stead of be­ing in­de­pen­dent, the Data Pro­tec­tion Au­thor­ity is con­sti­tuted by mem­bers se­lected by the Gov­ern­ment. Hav­ing this unchecked power when it comes to col­lect­ing and pro­cess­ing data is prob­lem­atic es­pe­cially it has the po­ten­tial to give the Gov­ern­ment the abil­ity to use this data to iden­tify mi­nori­ties.

Se­condly, it just does not make sense to ad­dress non­per­sonal data un­der a per­sonal data pro­tec­tion bill. Even be­fore this ver­sion of the bill came out, there had been mul­ti­ple calls to ap­point a sep­a­rate com­mit­tee to come up with rec­om­men­da­tions in this space. It would have then been ideal to have a dif­fer­ent bill that looks at non-per­sonal data. Be­cause the sub­ject is so vast, it does not make sense for it to be gov­erned by a few lines in Sec­tion 91 for the fore­see­able fu­ture.

So the bot­tom line is that anonymised data and non­per­sonal data can be used to iden­tify peo­ple. The gov­ern­ment hav­ing unchecked pow­ers to col­lect and process these kinds of data has the po­ten­tial to lead to se­verely neg­a­tive con­se­quences. It would be bet­ter in­stead, to re­think the ap­proach to non-per­sonal and anonymised data and have a sep­a­rate com­mit­tee and reg­u­la­tion for this.

Newspapers in English

Newspapers from India

© PressReader. All rights reserved.