One part of your DNA Story is your ethnicity estimate. This can tell you where your ancestors came from hundreds to over 1,000 years ago. Ancestry uses a reference panel to help figure this out.
A reference panel is made up of DNA samples from people with a long family history in one place or within one group. We compare your DNA to DNA samples from the reference panel to help create your estimate. For example, if one section of your DNA looks most similar to reference panel samples from Japan, then we assign that DNA to our Japan region. Our reference panel currently has samples reflecting 43 different populations from all around the world. (Learn more about the reference panel.)
Obviously, it’s important to make sure the reference panel is made up of samples from people who are truly representative of a particular population! Otherwise we would not be able to assign the right ethnicity to your DNA.
Building the Best Reference Panel
For our latest reference panel, we started with more than 30,000 people who had strong evidence showing that their families had lived in a region for a long time. Then we analyzed their DNA in several ways to confirm that their family histories matched their DNA.
The first thing we did was to make sure no two people in the reference panel were too closely related. In cases where they were, one sample was removed. Close relatives share DNA in a way that skews the results of the reference panel.
Next, we did something called a Principal Components Analysis (PCA) on the DNA data of each candidate. This eliminated those who did not have the ethnicity they claimed. For example, this analysis can quickly tell if someone in the France part of the reference panel is really from another part of the world. PCA is a pretty common technique for this type of work.
This figure is an example of PCA analysis. Each dot represents one person. As you can see, people tend to cluster together based on where their DNA came from. You can also see outliers, dots that don’t match the overall pattern. We eliminate those outliers from the reference panel.
Finally, we did a "leave-one-out analysis" on the remaining candidates. For this analysis, we take a group of people out of the reference panel and then test them against the remaining reference panel. We do this 20 times, removing 5% of the panel each time, so everyone in the panel gets tested. Anyone who appears to be an outlier at this stage is eliminated.
A Matter of Percentage Points
Once we had the best candidates, we reran the “leave-one-out-analysis” on the remaining candidates to test the quality of the panel. The results for some regions were better than others.
For example, people in the reference panel who represented our Japan region looked to be pretty close to 100% Japanese using our latest algorithm. The same was true for people in the reference panel who represented our Western & Central India, Polynesia, Finland, Philippines, Africa South-Central Hunter-Gatherers, and Cameroon, Congo, and Southern Bantu Peoples regions, as well as the European Jewish group.
Other regions in the reference panel were a little harder to predict. For example, people in the reference panel who represented the Basque region appeared to be 54% Basque on average. This happens because nearby regions are often very similar to one another. Basque and Spain for example, or Germany and France. Overall, when it came to correctly predicting the genetic ethnicity a member of the reference panel represented, our average was 78.9%. But even when a prediction falls short, it doesn’t fall too far from the correct region. For example, some people from Spain might get assigned to nearby regions like France and Portugal.
As we continue to update and build upon our current algorithm and reference panel, our accuracy will continue to improve. Which means even better results for you.
After all of this, we ended up with 16,638 individuals in the reference group split up like this:
|Region||Number of Samples|
|Africa South-Central Hunter-Gatherers||34|
|Benin & Togo||224|
|Cameroon, Congo, & Southern Bantu Peoples||579|
|Ivory Coast & Ghana||124|
|Native American—North, Central, South||146|
|Central & Northern Asia||186|
|Southeast Asia–Dai (Thai)||80|
|Western & Central India||65|
|Korea & Northern China||261|
|England, Wales & Northwestern Europe||1,519|
|Ireland & Scotland||500|
|Greece & the Balkans||242|
|Eastern Europe & Russia||1,959|
|Iran / Persia||459|
|Turkey & the Caucasus||101|
Still curious to understand more? Cool--we're glad you're as interested in genetics as we are. Check out our white paper on ethnicity prediction.