One part of your DNA Story is your ethnicity estimate. This can tell you where your ancestors came from hundreds to over 1,000 years ago. Ancestry uses a reference panel to help figure this out.

A reference panel is made up of DNA samples from people with a long family history in one place or within one group. We compare your DNA to DNA samples from the reference panel to help create your estimate. For example, if one section of your DNA looks most similar to reference panel samples from Japan, then we assign that DNA to our Japan region. Our reference panel currently has samples reflecting 43 different populations from all around the world. (Learn more about the reference panel.)

Obviously, it’s important to make sure the reference panel is made up of samples from people who are truly representative of a particular population! Otherwise we would not be able to assign the right ethnicity to your DNA.

Building the Best Reference Panel

For our latest reference panel, we started with more than 30,000 people who had strong evidence showing that their families had lived in a region for a long time. Then we analyzed their DNA in several ways to confirm that their family histories matched their DNA.

The first thing we did was to make sure no two people in the reference panel were too closely related. In cases where they were, one sample was removed. Close relatives share DNA in a way that skews the results of the reference panel.

Next, we did something called a Principal Components Analysis (PCA) on the DNA data of each candidate. This eliminated those who did not have the ethnicity they claimed. For example, this analysis can quickly tell if someone in the France part of the reference panel is really from another part of the world. PCA is a pretty common technique for this type of work.

This figure is an example of PCA analysis. Each dot represents one person. As you can see, people tend to cluster together based on where their DNA came from. You can also see outliers, dots that don’t match the overall pattern. We eliminate those outliers from the reference panel.

Finally, we did a "leave-one-out analysis" on the remaining candidates. For this analysis, we take a group of people out of the reference panel and then test them against the remaining reference panel. We do this 20 times, removing 5% of the panel each time, so everyone in the panel gets tested. Anyone who appears to be an outlier at this stage is eliminated.

A Matter of Percentage Points

Once we had the best candidates, we reran the “leave-one-out-analysis” on the remaining candidates to test the quality of the panel. The results for some regions were better than others.

For example, people in the reference panel who represented our Japan region looked to be pretty close to 100% Japanese using our latest algorithm. The same was true for people in the reference panel who represented our Western & Central India, Polynesia, Finland, Philippines, Africa South-Central Hunter-Gatherers, and Cameroon, Congo, and Southern Bantu Peoples regions, as well as the European Jewish group.

Other regions in the reference panel were a little harder to predict. For example, people in the reference panel who represented the Basque region appeared to be 54% Basque on average. This happens because nearby regions are often very similar to one another. Basque and Spain for example, or Germany and France. Overall, when it came to correctly predicting the genetic ethnicity a member of the reference panel represented, our average was 78.9%. But even when a prediction falls short, it doesn’t fall too far from the correct region. For example, some people from Spain might get assigned to nearby regions like France and Portugal.

As we continue to update and build upon our current algorithm and reference panel, our accuracy will continue to improve. Which means even better results for you.

After all of this, we ended up with 16,638 individuals in the reference group split up like this:

Region Number of Samples
Northern Africa 41
Africa South-Central Hunter-Gatherers 34
Benin & Togo 224
Cameroon, Congo, & Southern Bantu Peoples 579
Ivory Coast & Ghana 124
Eastern Africa 82
Mali 169
Nigeria 111
Senegal 31
Native American—North, Central, South 146
Native American—Andean 63
Central & Northern Asia 186
Southern Asia 600
Balochistan 53
Burusho 23
China 620
Southeast Asia–Dai (Thai) 80
Western & Central India 65
Japan 592
Korea & Northern China 261
Philippines 538
Southeast Asia–Vietnam 159
England, Wales & Northwestern Europe 1,519
Baltic States 194
Basque 22
Ireland & Scotland 500
European Jewish 200
France 1,407
Germanic Europe 2,072
Greece & the Balkans 242
Italy 1,000
Norway 367
Portugal 404
Sardinia 30
Eastern Europe & Russia 1,959
Spain 270
Sweden 372
Finland 361
Middle East 271
Iran / Persia 459
Turkey & the Caucasus 101
Melanesia 49
Polynesia 58
Total 16,638

Still curious to understand more? Cool--we're glad you're as interested in genetics as we are. Check out our white paper on ethnicity prediction.