Con­cern about genome in­va­sions

Wide­spread DNA se­quenc­ing is putting peo­ple’s pri­vacy in jeop­ardy, study says.

Los Angeles Times - - CALIFORNIA - DEB­O­RAH NETBURN deb­o­rah.netburn @la­times.com

Ev­ery­one’s DNA se­quence is unique. But for those who wish to main­tain their ge­netic pri­vacy, it may not be unique enough.

A new study ar­gues that more than half of Amer­i­cans could be iden­ti­fied by name if all you had to start with was a sam­ple of their DNA and a few ba­sic facts, such as the re­gion where they live and about how old they might be.

It wouldn’t be sim­ple, and it wouldn’t be cheap. But the fact that it has be­come doable will force all of us to re­think the mean­ing of pri­vacy in the DNA age, ex­perts said.

There is lit­tle time to waste. The re­searchers be­hind the new study say that once 3 mil­lion Amer­i­cans have up­loaded their genomes to pub­lic ge­neal­ogy web­sites, nearly ev­ery­one in the U.S. would be iden­ti­fi­able by their DNA alone and just a few ad­di­tional clues.

More than 1 mil­lion Amer­i­cans have al­ready pub­lished their ge­netic in­for­ma­tion, and dozens more do so ev­ery day.

“Peo­ple have been won­der­ing how long it will be be­fore you can use DNA to de­tect just about any­body,” said Ruth Dick­over, di­rec­tor of the foren­sic sci­ence pro­gram at UC Davis who was not in­volved with the study. “The au­thors are say­ing it’s not go­ing to take that long.”

This new re­al­ity rep­re­sents the con­ver­gence of two long-stand­ing trends.

One of them is the rise of direct-to-con­sumer ge­netic test­ing. Com­pa­nies such as Ances­try.com and 23andMe can se­quence any­one’s DNA for about $100. All you have to do is pro­vide a sam­ple of saliva and drop it in the mail.

The other es­sen­tial el­e­ment is the pro­lif­er­a­tion of pub­licly search­able ge­neal­ogy data­bases like GEDmatch. Any­one can up­load a full genome to these sites and pow­er­ful com­put­ers will crunch through it, look­ing for stretches of match­ing DNA se­quences that can be used to build out a fam­ily tree.

To test the grow­ing power of these sites, re­searchers led by Columbia Univer­sity com­puter sci­en­tist Yaniv Er­lich set out to see whether they could find a per­son’s name — and thus, her iden­tity — if all they had to go on was a piece of her DNA and a small amount of bi­o­graph­i­cal in­for­ma­tion.

They started with a full DNA se­quence from a Utah woman whose ge­netic in­for­ma­tion was pub­lished anony­mously as part of an un­re­lated sci­en­tific study. (They had ac­tu­ally iden­ti­fied this woman in a pre­vi­ous study, but for the pur­poses of this work, they pre­tended they didn’t know who she was.)

Er­lich and his col­lab­o­ra­tors up­loaded her ge­netic code to GEDmatch and ran a search to see if she had any re­la­tions on the site. They found two: one in North Dakota and one in Wy­oming.

The re­searchers could tell they were all re­lated be­cause they shared a num­ber of sin­gle nu­cleo­tide poly­mor­phisms, or SNPs. These are sin­gle let­ters in spe­cific spots among the roughly 3 bil­lion A’s, Cs, Ts and Gs that make up the hu­man genome. The more SNPs peo­ple share, the more closely re­lated they are. By com­par­ing the DNA of all three rel­a­tives, Er­lich’s team was able to find a com­mon an­ces­tral cou­ple that were the Utah woman’s great-grand­par­ents.

Next, the re­searchers scoured ge­nealog­i­cal web­sites and other sources for ad­di­tional de­scen­dants of that long-ago cou­ple. They found 10 chil­dren and hun­dreds of grand­chil­dren and great-grand­chil­dren.

Then they started culling their mas­sive list of de­scen­dants. They elim­i­nated the men from the sam­ple, then de­scen­dants who were not alive when the Utah woman’s DNA was se­quenced. The au­thors also knew that their sub­ject was mar­ried and how many chil­dren she had, which helped them zero in on their tar­get.

Af­ter a long day of painstak­ing work, the re­searchers were able to cor­rectly name the owner of the DNA sam­ple.

The au­thors said the same process would work for about 60% of Amer­i­cans of Euro­pean de­scent, who are the peo­ple most likely to use ge­nealog­i­cal web­sites, Er­lich said. Though the odds of suc­cess would be lower for peo­ple from other back­grounds, it would still be ex­pected to work for more than half of all Amer­i­cans, they said.

To come to this con­clu­sion, the re­searchers an­a­lyzed a dif­fer­ent data­base con­sist­ing of 1.28 mil­lion anony­mous in­di­vid­u­als who had their DNA se­quenced by MyHer­itage, a DNA test­ing and fam­ily his­tory com­pany where Er­lich is the chief sci­ence of­fi­cer.

If you can find a per­son’s third cousin in a ge­nealog­i­cal data­base, then you should be able to iden­tify the per­son with a rea­son­able amount of sleuthing, Er­lich said. So the team checked to see how many rel­a­tives on the or­der of a third cousin or closer they could find for each in­di­vid­ual in their data set.

They found plenty: 60% of the 1.28 mil­lion peo­ple were matched with a rel­a­tive who was at least as close as a third cousin, and 15% had a rel­a­tive who was at least as close as a sec­ond cousin.

The find­ings were pub­lished Thurs­day in the jour­nal Sci­ence.

So far, 72-year-old Joseph James DeAn­gelo is the most fa­mous per­son to be iden­ti­fied this way. You may know him bet­ter as the sus­pected Golden State Killer, charged with 13 counts of mur­der and 13 counts of at­tempted kid­nap­ping.

Pri­vate cit­i­zens are ben­e­fit­ing from the tech­nol­ogy as well. Adoptees have found bi­o­log­i­cal par­ents and sib­lings, and oth­ers have found dis­tant cousins who can shed new light on a fam­ily’s ori­gins and her­itage.

But as more of us up­load DNA to pub­licly search­able data­bases, the im­pli­ca­tions can be creepy.

“When the po­lice caught the Golden State Killer, that was a very good day for hu­man­ity,” Er­lich said. “The prob­lem is that the very same strat­egy can be mis­used.”

Think of for­eign gov­ern­ments us­ing this tech­nique to track down Amer­i­can cit­i­zens, he said. Or protesters and ac­tivists be­ing pur­sued in this way.

Er­lich and his coau­thors pro­posed a mit­i­ga­tion strat­egy that would make it harder to up­load an un­known DNA se­quence to a ge­nealog­i­cal data­base and search for a match.

They sug­gest that di­rectto-con­sumer DNA test­ing com­pa­nies put a spe­cial code on the raw data files they send to their cus­tomers. Ge­neal­ogy sites could then agree to al­low peo­ple to up­load DNA se­quences only if they have a valid code. This would en­sure that peo­ple could con­duct searches re­lated only to their own DNA.

A sys­tem like this would not pre­vent law en­force­ment from us­ing ge­nealog­i­cal data­bases to search for sus­pects, Er­lich said.

Al Seib Los An­ge­les Times

A CRIMINALIST with the Hertzberg-Davis Foren­sic Sci­ence Cen­ter in L.A. shows a com­puter dis­play of a DNA pro­file gen­er­ated by a ge­netic an­a­lyzer.

Newspapers in English

Newspapers from USA

© PressReader. All rights reserved.