The end of anonymity as we know it


The ge­netic ge­neal­ogy in­dus­try is boom­ing. In re­cent years, more than 15 mil­lion peo­ple have of­fered up their DNA — a cheek swab, some saliva in a test-tube — to ser­vices such as 23andMe and An­ces­ in pur­suit of an­swers about their her­itage. In ex­change for a ge­netic fin­ger­print, in­di­vid­u­als may find a birth par­ent, long-lost cousins, per­haps even a link to Oprah or Alexan­der the Great.

But as th­ese registries of ge­netic iden­tity grow, it’s be­com­ing harder for in­di­vid­u­als to re­tain any anonymity. Al­ready, 60 per cent of Amer­i­cans of North­ern Euro­pean de­scent — the pri­mary group us­ing th­ese sites — can be iden­ti­fied through such data­bases whether or not they’ve joined one them­selves, ac­cord­ing to a study pub­lished last week in the jour­nal Sci­ence.

Within two or three years, 90 per cent of Amer­i­cans of Euro­pean de­scent will be iden­ti­fi­able from their DNA, re­searchers found. The sci­ence-fic­tion fu­ture, in which every­one is known whether or not they want to be, is nigh.

“It’s not the dis­tant fu­ture, it’s the near fu­ture,” said Yaniv Er­lich, lead au­thor of the study. Er­lich, for­merly a ge­net­icpri­vacy re­searcher at Columbia Uni­ver­sity, is chief sci­ence of­fi­cer of MyHer­itage, a ge­netic an­ces­try web­site.

The sci­ence in­volves a search for third cousins. To iden­tify a per­son through a DNA sam­ple, an in­ves­ti­ga­tor up­loads a pre­vi­ously an­a­lyzed ge­netic se­quence to a data­base. The goal is to find some­one who shares enough DNA to place them in the third cousin or closer range. Most of us have at least 800 peo­ple out there, some­where in the world, who fall into this cat­e­gory. So long as one of th­ese peo­ple is in a data­base, a skilled sleuth may be able to use other pub­licly avail­able in­for­ma­tion to start build­ing a fam­ily tree and fig­ure out the per­son’s ac­tual iden­tity.

That tech­nique has been used in re­cent months to iden­tify more than 15 sus­pects in mur­der and sex­ual as­sault cases. The break­throughs be­gan in April with an ar­rest in the case of the Golden State Killer, who ter­ror­ized Cal­i­for­nia with rapes and mur­ders in the ’70s and ’80s. Other suc­cesses soon fol­lowed. A truck driver in Wash­ing­ton state was charged with the mur­der of a Cana­dian cou­ple in 1987; a DJ in Penn­syl­va­nia was charged with the mur­der of a teacher in 1992.

Watch­ing th­ese de­vel­op­ments, Er­lich won­dered about the odds of iden­ti­fy­ing any given per­son through cousins’ DNA in one of th­ese data­bases.

His anal­y­sis is based not on the big ge­neal­ogy data­bases such as 23andMe and An­ces­try, but on two of the small­est: GED­match, which has around one mil­lion pro­files, and MyHer­itage, which had around 1.5 mil­lion at the time of the study. That’s be­cause, for le­gal and lo­gis­ti­cal rea­sons, the larger sites can­not be eas­ily used to iden­tify any­one other than cus­tomers who mail in saliva.

But the smaller sites, set up to help ge­neal­o­gists max­i­mize the odds of find­ing rel­a­tives, are more flex­i­ble. GED­match al­lows law-en­force­ment of­fi­cials to scan its data­base in mur­der and sex­ual as­sault cases. MyHer­itage does not, but it per­mits up­loads from ex­ter­nal labs. With both, it’s hard to be sure what’s be­ing up­loaded: grandma’s saliva, crime scene blood, a sam­ple from a med- ical study or some­thing else en­tirely.

To de­ter­mine the odds of cor­rectly iden­ti­fy­ing an in­di­vid­ual from a given DNA sam­ple, Er­lich and his col­leagues — from Columbia Uni­ver­sity, the He­brew Uni­ver­sity of Jerusalem and the New York Genome Cen­ter — an­a­lyzed 30 DNA kits cho­sen at ran­dom from the GED­match data­base.

Their re­sults were eye-open­ing. The team found that a DNA sam­ple from an Amer­i­can of North­ern Euro­pean her­itage could be tracked suc­cess­fully to a third-cousin dis­tance of its owner in 60 per­cent of cases. A com­pa­ra­ble anal­y­sis on the MyHer­itage site had sim­i­lar re­sults. (The anal­y­sis fo­cused on Amer­i­cans of North Euro­pean back­ground be­cause 75 per­cent of the users on GED­match and other ge­neal­ogy sites be­long to that de­mo­graphic.)

Some ex­perts have raised ques­tions about the study’s method­ol­ogy. Its sam­ple size was small, and it didn’t fac­tor in that more than one match is of­ten re­quired to iden­tify a sus­pect.

CeCe Moore, a ge­netic ge­neal­o­gist with Parabon, a foren­sic con­sult­ing firm, also ex­pressed worry in an email that the Sci­ence pa­per may ob­scure the dif­fi­culty in­volved in puz­zling out some­one’s iden­tity; it takes a highly skilled ex­pert to build a fam­ily tree from the ini­tial ge­netic clues.

Still, she said, the take­away of the study “is not news to us.” In re­cent months Moore has been in­volved in a dozen mur­der and sex­ual as­sault cases that used GED­match to iden­tify sus­pects. Of the 100 crime-scene pro­files her firm had up­loaded to GED­match by May, half were ob­vi­ously solv­able, she said, and 20 were “promis­ing.”

“I think it’s a strong and con­vinc­ing pa­per,” said Gra­ham Coop, a pop­u­la­tion ge­net­ics re­searcher at the Uni­ver­sity of Cal­i­for­nia, Davis. In a blog post in May, Coop cal­cu­lated just how lucky in­ves­ti­ga­tors had been in the Golden State killer case. He reached a sta­tis­ti­cal con­clu­sion sim­i­lar to Er­lich’s: So­ci­ety is not far from be­ing able to iden­tify 90 per cent of peo­ple through the DNA of their cousins in ge­nealog­i­cal data­bases.

“This is this mo­ment of, wow, oh, this opens up a lot of pos­si­bil­i­ties, some of which are good and some are more ques­tion­able,” he said.

In an alarm­ing re­sult, the Sci­ence study found that a sup­pos­edly “anonymized” ge­netic pro­file taken from a med­i­cal data set could be up­loaded to GED­match and pos­i­tively iden­ti­fied. This shows that an in­di­vid­ual’s pri­vate health data might not be so pri­vate af­ter all.

Er­lich has urged ge­neal­ogy com­pa­nies to con­sider at­tach­ing some sort of cryp­to­graphic sig­na­ture to the ge­netic pro­files they an­a­lyze. This would help en­sure that who­ever up­loads a ge­netic pro­file is who they say they are, and mak­ing it harder for any­one to abuse this data, should they for ex­am­ple want to fig­ure out who at­tended a protest.

Daniel MacArthur, a ge­nomics re­searcher at Mass­a­chu­setts Gen­eral Hos­pi­tal, said he en­dorses the cryp­to­graphic sig­na­ture, but that it doesn’t go far enough. “We live in a world where peo­ple are very in­ter­ested in ob­tain­ing and shar­ing their ge­netic data to learn more about them­selves,” he said. “It’s a nat­u­ral hu­man in­stinct. But leg­isla­tive pro­tec­tion is re­quired to en­sure that it’s not used for ne­far­i­ous pur­poses.”


A 23andMe Inc. DNA ge­netic test­ing kit. Within two or three years, 90 per cent of Amer­i­cans of Euro­pean de­scent will be iden­ti­fi­able from their DNA.


Au­thor­i­ties said they used a ge­netic ge­neal­ogy web­site to con­nect some crime-scene DNA to Joseph James DeAn­gelo, a.k.a. the “Golden State Killer.”

Newspapers in English

Newspapers from Canada

© PressReader. All rights reserved.