AncestryDNA Ethnicity Prediction: Learning to Speak Genetics
Back to Blog

AncestryDNA' is one of the most advanced autosomal DNA tests on the market, but that doesn't mean our job is done. We are constantly working to improve our genetic ethnicity prediction models by deciphering the unique language of the human genome and employing some of the top geneticists and latest technology to help determine what it can tell us.

Dr. Ken Chahine, Sr. VP of AncestryDNA and one of our top scientists, explains some of the challenges we face when using DNA to predict ethnicity, including the work we do to innovate in this field and deliver the best product possible to our customers. Here's some perspective on genetic ethnicity from Dr. Chahine.
Before AncestryDNA, ethnic origins were largely a breakdown of continental ethnicities. Most of us, however, don't need a genetic test to determine whether we are European, African, or Asian. So, we challenged ourselves to push the boundaries of the science and attempt a more granular ethnic breakdown, especially within Europe. Below is a map showing the detailed ethnicity coverage of AncestryDNA.

AncestryDNA Ethnicity Results Coverage

Why are we one of the first to launch a product that breaks down ethnic origins beyond the continental level' Simply put, it is very difficult. Europeans, Africans, and Asians are genetically very distinct. However, it is not as easy to ethnically distinguish between a British, a German and a French person, and it is especially difficult to decipher the ethnicity of an individual with ancestry from all three or some other comparable mixture.  

The Language of Genetic Ethnicity

Since the genome was sequenced in 2000, we have made great strides in our understanding of human genetics and inheritance. The truth is, however, that the human genome is still largely a language that we don't understand. Sure, we've deciphered the alphabet that makes up the 3 billion letters of our genome, but we know woefully little about its vocabulary, grammar and syntax.We know, for example, that height is inherited. Yet, using genetics alone still makes it difficult to predict height. Most of the genetic signatures (i.e., alleles) that we have identified as being associated with height contribute only a small percent to one's ultimate vertical fate. In other words, while we understand how to read the letters of the human genome we don't always know what it is telling us.

Piecing the Puzzle Together

To continue with the language metaphor, let's assume I give you three books written in languages that you do not speak. Then I tell you, as a point of reference, what language each one is written in: one in English, one in Arabic, and the third in Chinese. Then imagine I give you another book written in one of those three languages and ask you to tell me which language it is. Using only the letters in the reference books, it would be relatively easy to not only determine which language the book is written in, but even what percent of the book was written in each language, if it contained a mixture of all three. This is because the alphabets of the three languages are distinct and don't overlap.
  Now imagine the same puzzle, but instead of English, Chinese and Arabic, the books are written in English, French and German. In this case, it is clearly more difficult to discern where one language ends and another begins, since all three use mostly the same basic alphabet. We must then rely on three basic strategies to distinguish the languages. First, the frequency of certain letters that appears to be used more or less frequently in French, English and German. Second, the relative position of letters, such as the combination of letters ch, sch,and ing. Third, letters such as ?ç?,ß, and ?ü? which are unique to certain languages. As you can see in the graph below, even though the languages are different, the frequency of the letters used in all three languages is relatively consistent. Therefore, most of the letters are of little use in distinguishing the languages.   Frequency of letter usage across three languages   There is one more important point to make'we don't have a dictionary! That's right, there is no genetic dictionary that tells us the frequency of the letters, the relative position of the letters, or even the unique letters that occur in different European populations. AncestryDNA is building this genetic dictionary by analyzing the genetic signatures of people who have a long cultural history in a specific country or region, have spoken a certain language, and practiced a single religion. Once we have the genetic sequences, our team of Ph.D scientists in genetics, bioinformatics, machine learning, and statistics work to find clues that help us distinguish genetic ethnicity and provide our customers their ethnic make-ups.  

The European Challenge

The good news is that the genetic ethnicity prediction is working, albeit with some challenges. Central Europeans present the most significant difficulty, especially the French, Germans, and Dutch. With few geographic barriers and extensive human population movement, their genetic signatures are very similar and difficult to distinguish. The British Isles and Scandinavia are more genetically distinct, but their signatures partially overlap with each other, as well as with parts of Central Europe. All of this makes it difficult to assign predicted ethnicities. So, let's say your German ancestry doesn't seem to be showing up in your DNA ethnicity results or it seems like you're getting a bit too much Scandinavian, know that the ethnicity prediction can be updated over time as we make advancements in this area. This is just one example of why the ethnicity prediction portion of the AncestryDNA test is continually evolving. We are using the largest set of DNA reference samples from around the world and deeper genetic coverage in order to find those unique letters' that will aid our analysis. In the meantime, we're excited to have our AncestryDNA customers be a part of the breakthroughs as we continue to improve our prediction algorithms. And as they evolve, we will send you updates as new findings are discovered. The AncestryDNA test could have easily predicted your continental ethnicity as European, Asian, or African, but why settle for results based on the status quo' As Michelangelo is quoted as saying, The greatest danger for most of us is not that our aim is too high and we miss it, but that it is too low and we reach it. It's one of the many benefits of AncestryDNA. So don't be surprised if your ethnicity results get updated over time. This is a good thing and it just means our science team is working hard to better your experience. If you're interested in learning more about the new AncestryDNA, or would like to order a DNA test, you can click here.