To estimate your genetic ethnicity, we compare your DNA to the DNA of a group of living people with known ancestries. This group of individuals is called our reference panel. The AncestryDNA Reference Panel is a collection of thousands of DNA samples from around the globe. Each sample is from a documented location and most are accompanied by a documented family tree indicating deep ancestry in a particular region.
In total, our reference panel is made up of 3,000 individuals chosen specifically to provide accurate ethnicity estimates for 26 global ethnicity regions. This panel is created from our entire native sample collection of 4,245 individuals. To estimate your genetic ethnicity, we look at information contained in over 700,000 positions in your DNA. By comparing this to the DNA of individuals in the reference panel, we can see where you fall in among those 26 regions.
Estimated genetic ethnicities of our reference panel
To test our estimation, we estimate the genetic ethnicity of each person in our panel by comparing their DNA to the rest of the AncestryDNA reference panel. This is called a "leave-one-out analysis". For a region that is more homogeneous, most samples from that region show a tight similarity to that region. For more diverse regions (those with significant DNA overlap from neighboring regions) we see a wider range of estimates among reference panel individuals.
The chart below shows the amount of correctly predicted ethnicity for reference panel individuals from each region.
It is clear from this graph that for the majority of samples in each region, we predict at least 80% of the genetic ethnicity to be from the correct region. However, there are exceptions. In particular, our average prediction accuracy for samples from Great Britain, Western Europe, Iberia, and Mali are not quite as high. There are many factors affecting the accuracy of these estimates, most importantly the number of reference samples in the panel for each region and the similarity of a given region to others.
How to read box plot data
In descriptive statistics, a box and whisker chart (or "boxplot") shows the spread of a set of data points using a few key statistics:
Below is a chart showing the number of samples in each of the 26 regions in the AncestryDNA Reference Panel:
|Africa Southeastern Bantu||18|
|Africa South-Central Hunter-Gatherers||35|
Still curious to understand more? Cool--we're glad you're as interested in genetics as we are. Check out our white paper on ethnicity prediction.