I'm not sure I can explain this adequately in a message board post, and I'm not a population geneticist, but here are a couple of thoughts.

First, forget about paper records. DNA admixture analysis is completely independent of traditional research. What the scientists do is collect DNA samples from a number of people whose families have lived in a particular area for some time. Usually, the minimum requirement is that all four of their grandparents have the same ethnic background. They then use a variety of statistical tools to isolate small segments of DNA that are common to those people. These groups get labels like "British Isles," "Native North American," "East Asian," etc. The labels are somewhat arbitrary, and don't necessarily refer to recent nationalities. It is my understanding that much of the ethnicity analysis is based on the 1000 Genomes Project, which focused largely on indigenous peoples of Africa, Asia and the Americas. The National Geographic Genographic Project added many more, including a number from Europe. The result is that there is now a number of reference populations that have characteristic DNA patterns. Each company and each researcher has a unique set of these populations, reflecting their product lines and research interests.

When you test, your pattern is compared with those of the reference populations and you are given a result that indicates which ones you match most closely. The results are fairly reliable on a continental basis, i. e., distinguishing European from African ancestry. But especially for places like Western Europe where there has been a lot of mixing, the samples are not homogeneous enough to provide a precise regional breakdown. This is what 23 and Me means when they refer to the need for more data.

