To estimate your genetic ethnicity, we compare your DNA to the DNA of a group of living people with known ancestries. This group of individuals is called our reference panel. The AncestryDNA Reference Panel is a collection of thousands of DNA samples from around the globe. Each sample is from a documented location and most are accompanied by a documented family tree indicating deep ancestry in a particular region.

In total, our reference panel is made up of 3,000 individuals chosen specifically to provide accurate ethnicity estimates for 26 global ethnicity regions. This panel is created from our entire native sample collection of 4,245 individuals. To estimate your genetic ethnicity, we look at information contained in over 700,000 positions in your DNA. By comparing this to the DNA of individuals in the reference panel, we can see where you fall in among those 26 regions.

Estimated genetic ethnicities of our reference panel

To test our estimation, we estimate the genetic ethnicity of each person in our panel by comparing their DNA to the rest of the AncestryDNA reference panel. This is called a "leave-one-out analysis". For a region that is more homogeneous, most samples from that region show a tight similarity to that region. For more diverse regions (those with significant DNA overlap from neighboring regions) we see a wider range of estimates among reference panel individuals.

The chart below shows the amount of correctly predicted ethnicity for reference panel individuals from each region.

Leave-one-out analysis of the V2 reference panel. Here we plot the results of an experiment in which each sample is removed from the reference set one-by-one and its ethnicity is estimated using the remaining panel samples. Each bar represents the average correctly predicted ethnicity for all samples from a given region.

It is clear from this graph that for the majority of samples in each region, we predict at least 80% of the genetic ethnicity to be from the correct region. However, there are exceptions. In particular, our average prediction accuracy for samples from Great Britain, Western Europe, Iberia, and Mali are not quite as high. There are many factors affecting the accuracy of these estimates, most importantly the number of reference samples in the panel for each region and the similarity of a given region to others.

How to read box plot data

In descriptive statistics, a box and whisker chart (or "boxplot") shows the spread of a set of data points using a few key statistics:

= half of the samples fall in this range - 25% of samples are above it, and 25% are below it

= marks the median (half of all samples are above or below this line. This marks the "typical native")

= the lower 25% of samples fall in this range

= the upper 25% of samples fall in this range

Below is a chart showing the number of samples in each of the 26 regions in the AncestryDNA Reference Panel:

Region # Samples
Great Britain 111
Ireland 138
Europe East 432
Iberian Peninsula 81
European Jewish 189
Scandinavia 232
Italy/Greece 171
Europe West 166
Finland/Northwest Russia 59
Africa Southeastern Bantu 18
Africa North 26
Africa South-Central Hunter-Gatherers 35
Benin/Togo 60
Cameroon/Congo 115
Ivory Coast/Ghana 99
Mali 16
Nigeria 67
Senegal 28
Native American 131
Asia Central 26
Asia East 394
Asia South 161
Melanesia 28
Polynesia 18
Caucasus 58
Middle East 141

Still curious to understand more? Cool--we're glad you're as interested in genetics as we are. Check out our white paper on ethnicity prediction.