DNA Ethnicity
We talk to Alasdair Macdonald and Graham Holton from the University of Strathclyde about what your DNA test results can really tell you about your ethnicity
What your results can really tell you about your ethnicity
Last year, millions of people who had taken an AncestryDNA test discovered that their ‘ethnicity’ results had been updated. The company explained that it had refined its models, enabling it to give more accurate results. But how do DNA testing companies calculate your ethnic make-up, and how accurate is it? Can the results you gain from these tests be used to inform your family history research, or is it just a gimmick? We headed to the University of Strathclyde and asked Alasdair Macdonald and Graham Holton, who are tutors on two new short online courses about genetic genealogy (see page 29), some searching questions.
What is ethnicity testing?
Ethnicity testing is an aspect of DNA testing that has received a lot of publicity and attracted many of the millions who have taken a DNA test, although the more accurate name is ‘admixture testing’. Admixture tests seek to determine your ancestry by identifying the geographical location of your ancestors. Such bio-geographical breakdowns use your autosomal DNA (atDNA), the 22 non-sex chromosomes. Companies use atDNA test results from contemporary populations who live in these regions to populate their reference databases, which enable them to infer a person’s ancestry.
How accurate are these reference databases?
Your atDNA result is compared with the test results of what is variously described as a ‘reference population’, ‘reference sample’ or ‘reference dataset’. The larger the reference populations and the more representative the geographical spread of these samples, the better the chance of a reliable admixture estimate. Although up-to-date information about the size of the reference populations is not readily available for all of the companies, it would appear that AncestryDNA now has the largest total reference population, with Living DNA (now also partnered with Findmypast) probably having the largest for Britain.
Admixture analysis has developed to the extent that predictions at the continental level are reliable, but accurate estimates of more specific geographical and ethnic groupings remain challenging. This is often because of a lack of sufficient reference samples from these more specific groupings, or – particularly in the case of European ancestry – because of the large amount of admixture that has taken place between different migrating populations over the past several
thousand years. This makes it difficult to distinguish, for example, ancestors living in eastern England 500 years ago and Scandinavian peoples from the same period, since many English people at that date were of Scandinavian descent.
A further factor to consider is that reference populations are sampled from present-day individuals who live in a specific location. Although people who are sampled and included in reference data are filtered to ensure that their immediate ancestors (grandparents) were all born close to one another, the original population may have been displaced due to war, political change, disease or famine. Particularly in parts of Europe, your ancestors may have moved to a reference region from another area in the past two centuries.
How representative of someone’s ancestry is their admixture result?
This depends largely on how the results are interpreted. As well as the accuracy of the reference populations used, another issue is that your genetic ancestry – on which these estimates are based – has only been inherited from some of your ancestors. If we look back five or seven generations, you will probably have a number of genealogical ancestors from whom you have not inherited any distinguishable segments of atDNA. Although they are genealogical ancestors, they may not be genetic ancestors. If you hope to gain an insight into where all of your ancestors lived 500 years ago, admixture estimates will not provide that. The estimates will only suggest the origins of those ancestors from whom you have inherited atDNA.
Also, do not be surprised if over time you see changes in the test results provided by a company. This does not mean that your earlier results were necessarily inaccurate. They would have been correct, but only in relation to the population dataset that was used. Changes in percentages will be due to improved sampling that is more representative of the reference populations.
Why do the DNA results not match the results of my own research into my family tree?
Even if your genealogical research has been faultless, it’s possible that there has been an unknown parental event such as an illegitimacy or unrecorded adoption due to the death of a parent and remarriage.
Another reason – and perhaps the most common – is that DNA from an ancestor has been ‘washed out’ due to recombination (when an egg is fertilised, atDNA is randomly shuffled, then put together again). This could mean that you no longer carry the admixture for the ancestry of a particular individual.
There are two more factors to consider. First, companies use different reference datasets, and there may be a lack of sampling from a specific population. Second, vendors also use data from test-takers to improve their reference populations, and a lack of customers from certain locations, for instance the far north of Scotland compared with southeast Scotland, will make results less representative than they ought to be.
Some of my recent ancestry is unknown. How can admixture results help me solve this?
Admixture percentages can provide useful clues for
‘Do not be surprised if over time you see changes in the test results provided by a company’
adoptees or those who have an unknown parent or grandparent. Although cousin-matching is of immediate use, if there are no close matches then admixture percentages may provide evidence of the predicted geographical origin of ancestors on a more general level. A test such as that by Living DNA may help, in certain circumstances, to provide direction on a specific level within the British Isles.
An example might be a woman from England who suspects that her grandmother had a liaison with a Polish officer during the Second World War. This is likely to show up in the admixture results as a very large Eastern European percentage.
Unfortunately, the woman would have the same admixture results if several of her ancestors migrated to England from Continental Europe in the 19th century.
This is because some European populations on a general level are genetically very similar, and are spread across modern geographical borders. However, companies that have superior sampling and analytical techniques will be able to tease these regional differences apart.
Individuals who suspect that a recent ancestor was from
‘Admixture percentages can provide useful clues for adoptees’
an endogamous community, where marrying within the same social or religious group is very common, will see matches reported with considerably more shared segments of DNA than you would expect from the actual genealogical relationship. This is simply because of sharing more than one common ancestor.
Admixture results will reflect this type of inheritance, so your Family Tree DNA results might report this as ‘Jewish diaspora’ and indicate whether the testtaker carries admixture that is Sephardic or Ashkenazi (or both).
Are some companies’ DNA tests more representative of your ancestry than others, for specific population groups?
The five major companies all report admixture results as percentages of ancestry from various populations, which are largely geographically based but in some instances are ‘ethnic populations’ such as Ashkenazi or Sephardic Jews. However, the use of the phrase ‘ethnic populations’ can be misleading, since academic research has revealed that Ashkenazi Jews are about 53 per cent European and 47 per cent Middle Eastern by ancestry.
The number of reference populations used by each test company varies, and is being constantly improved. At present the reference populations for each company are: Family Tree DNA 24; MyHeritage 42; AncestryDNA 43; 23andMe 45; and Living DNA 80.
The regions reported in your test results will not align exactly with these reference populations. For instance, AncestryDNA has 43 populations in its reference panel. However, its recently updated ‘genetic communities’ breakdown offers more than 380 global regions.
Closer to home, these ‘genetic communities’ give a breakdown for Britain of 19 regions, and for Ireland 92 regions. In time other countries will likely have similar coverage to Ireland. Living DNA
reports on 21 regions, while MyHeritage reports on two groups (‘English’ and ‘Irish, Scottish and Welsh’). 23andMe offers ‘UK’; and Family Tree DNA has only one: ‘British Isles’.
Confusingly, Ancestry also reports ‘genetic communities’ within its global populations list, calling these subregions or migrations. For example, within ‘South American Migrations’ are ‘Spaniards, Cubans, Dominicans and Venezuelans’.
Some companies have a stronger representation of samples from certain parts of the world than others – MyHeritage is thought to have a strong representation from Europe, while Living DNA is particularly strong on sampling from Britain.
Although this is likely to result in more accurate results for those with ancestry from these areas, this is a situation that may change from time to time as companies expand their reference samples and refine their methodologies.
The Oxford University project People of the
British Isles ( people ofthebritishisles.org) began in 2004, and sampled blood from volunteers nationwide in an attempt to create a detailed genetic map of the country. Its research demonstrated that fine-scale UK population differentiation could be achieved using recently developed software and a methodology that can detect subtle levels of genetic difference within populations.
The algorithm divides samples into genetic clusters without reference to their known geographical locations. When the birthplace of grandparents is overlaid, the genetic clustering can be mapped to geographical areas. Living DNA is using this approach for the admixture component of its DNA test, and can achieve superior results with a fraction of the samples of other vendors.
What period of time do these estimates refer to?
Not all of the companies reveal this information, but in most cases it is likely that the ancestors in the various estimated groupings lived about 500 years ago. Living DNA maps the predicted regions of your ancestral origins at various points in time, from about 10 generations ago (roughly 300 years) to many thousands of years ago, while 23andMe provides a separate report on recent ancestor locations. This attempts to indicate where your ancestors lived within the past 200 years, and is split into 120 regions. Family Tree DNA also includes a report on your ancient origins (see box, page 28).
Why do siblings show different admixture percentages?
Although half of your DNA comes from each of your parents, the percentage inherited from each grandparent can vary. For example, maternal grandparents might be split 24 per cent and 26 per cent, while the paternal split might be 21 per cent and 29 per cent. Siblings share 50 per cent of the DNA of their parents, and inherit different amounts of DNA from each grandparent. The sex of the transmitting parent will affect the amount of DNA that is transmitted to the child from each grandparent. Females recombine more than males, so are less likely to transmit a chromosome without recombination. These laws of genetics mean that full siblings may show different admixture percentages.