Too much DNA se­quenc­ing has put other peo­ple’s pri­vacy in jeop­ardy

Lodi News-Sentinel - - NATION - By Deb­o­rah Netburn

Every­one’s DNA se­quence is unique. But for those who wish to main­tain their ge­netic pri­vacy, it may not be unique enough.

A new study ar­gues that more than half of Amer­i­cans could be iden­ti­fied by name if all you had to start with was a sam­ple of their DNA and a few ba­sic facts, such as where they live and how about how old they might be.

It wouldn’t be sim­ple, and it wouldn’t be cheap. But the fact that it has be­come doable will force all of us to re­think the mean­ing of pri­vacy in the DNA age, ex­perts said.

There is lit­tle time to waste. The re­searchers be­hind the new study say that once 3 mil­lion Amer­i­cans have up­loaded their genomes to pub­lic ge­neal­ogy web­sites, nearly every­one in the U.S. would be iden­ti­fi­able by their DNA alone and just a few ad­di­tional clues.

More than 1 mil­lion Amer­i­cans have al­ready pub­lished their ge­netic in­for­ma­tion, and dozens more do so ev­ery day.

“Peo­ple have been won­der­ing how long it will be be­fore you can use DNA to de­tect just about any­body,” said Ruth Dick­over, direc­tor of the foren­sic science pro­gram at the Uni­ver­sity of Cal­i­for­nia, Davis who was not in­volved with the study. “The au­thors are say­ing it’s not go­ing to take that long.”

This new re­al­ity rep­re­sents the con­ver­gence of two long­stand­ing trends.

One of them is the rise of di­rect-to-con­sumer ge­netic test­ing. Com­pa­nies such as An­ces­ and 23andMe can se­quence any­one’s DNA for about $100. All you have to do is pro­vide a sam­ple of saliva and drop it in the mail.

The other es­sen­tial el­e­ment is the pro­lif­er­a­tion of pub­licly search­able ge­neal­ogy data­bases like GED­match. Any­one can up­load a full genome to these sites and pow­er­ful com­put­ers will crunch through it, look­ing for stretches of match­ing DNA se­quences that can be used to build out a fam­ily tree.

To test the grow­ing power of these sites, re­searchers led by Columbia Uni­ver­sity com­puter sci­en­tist Yaniv Er­lich set out to see whether they could find a per­son’s name — and thus, his iden­tity — if all they had to go on was a piece of his DNA and a small amount of bi­o­graph­i­cal in­for­ma­tion.

They started with a full DNA se­quence from a per­son whose ge­netic in­for­ma­tion was pub­lished anony­mously as part of an un­re­lated sci­en­tific study. (They had ac­tu­ally iden­ti­fied this woman in a pre­vi­ous study, but for the pur­poses of this work, they pre­tended they didn’t know who she was.)

Er­lich and his col­lab­o­ra­tors up­loaded her ge­netic code to GED­match and ran a search to see if she had any re­la­tions on the site. They found two: one in North Dakota and one in Wy­oming.

The re­searchers could tell they were all re­lated be­cause they shared a num­ber of sin­gle nu­cleo­tide poly­mor­phisms, or SNPs. These are sin­gle let­ters in spe­cific spots among the roughly 3 bil­lion A’s, Cs, Ts and Gs that make up the hu­man genome.

The more SNPs peo­ple share, the more closely re­lated they are.

By com­par­ing the DNA of all three rel­a­tives, Er­lich’s team was able to find a com­mon an­ces­tral cou­ple that were the Utah woman’s great-grand­par­ents.

Next, the re­searchers scoured ge­nealog­i­cal web­sites and other sources for ad­di­tional de­scen­dants of that long-ago cou­ple. They found 10 chil­dren and hun­dreds of grand­chil­dren and great-grand­chil­dren.

Then they started culling their mas­sive list of de­scen­dants. They elim­i­nated all the men from the sam­ple, then those who were not alive when the Utah woman’s DNA was se­quenced. The au­thors also knew that their sub­ject was mar­ried and how many chil­dren she had, which helped them zero in on their tar­get.

Af­ter a long day of painstak­ing work, they re­searchers were able to cor­rectly name the owner of the DNA sam­ple.

The au­thors said the same process would work for about 60 per­cent of Amer­i­cans of Euro­pean de­scent, who are the peo­ple most likely to use ge­nealog­i­cal web­sites, Er­lich said. Though the odds of suc­cess would be lower for peo­ple from other back­grounds, it would still be ex­pected to work for more than half of all Amer­i­cans, they said.

To come to this con­clu­sion, the re­searchers an­a­lyzed a dif­fer­ent data­base con­sist­ing of 1.28 mil­lion anony­mous in­di­vid­u­als who had their DNA se­quenced by MyHer­itage, a DNA test­ing and fam­ily his­tory com­pany where Er­lich is the chief science of­fi­cer.

If you can find a per­son’s third cousin in a ge­nealog­i­cal data­base, then you should be able to iden­tify the per­son with a rea­son­able amount of sleuthing, Er­lich said. So the team checked to see how many rel­a­tives on the or­der of a third cousin or closer they could find for each in­di­vid­ual in their data set.

They found plenty: 60 per­cent of the 1.28 mil­lion peo­ple were matched with a rel­a­tive who was at least as close as a third cousin, and 15 per­cent had a rel­a­tive who was at least as close as a sec­ond cousin.

Newspapers in English

Newspapers from USA

© PressReader. All rights reserved.