Nature Communications Publishes AncestryDNA Breakthrough on Genetic Communities

What if from your DNA, you could find out that you’re not just Irish, but related to the Ulster Irish who migrated in droves to the U.S.? Or descended from a group of African Americans in Maryland who left rural areas to put roots in cities. Or maybe the Acadians, who brought the French language and culture to Louisiana. What if you could see the people, places and migration paths in your family story?

Genetics has long been used to understand human history and migrations. However, due to limited samples or methods used, very few of these methods have shed insight into more recent human history over the last several hundred years.

After years of hard work, and a lot of rigorous statistics, we developed a novel scientific methodology that looks at how specific groups of people are connected through their DNA, what places they called home, and which migration paths they followed to get there – allowing genetics to reveal the history in a more recent time period than ever before.

Today, the science team is thrilled to announce that our work on identifying finer grain population structure was published in Nature Communications, “Clustering of 770 thousand genomes reveals post-colonial population structure of North America.”

The new research leverages the powerful combination of family history and genetic data unique to Ancestry to surface a more concrete and detailed genetic portrait of how our recent ancestors responded together to historic forces like politics, famine, war and immigration.

Caption: Figure 3 | Distribution of ancestral birth locations in North America associated with IBD clusters. Points show pedigree birth locations that are disproportionately assigned to each cluster. Only birth locations with OR > x within indicated generations y–z are plotted, in which parameters x, y, z are chosen separately per cluster to better visualize the cluster’s historical geographic concentration; full distributions of ancestral birth locations in the US, Europe and worldwide are given in Supplementary Figs. 18–20. For each cluster, points are independently scaled by the number of pedigree annotations. See Fig. 2 and Table 1 for more details. Note that clusters are separated into two maps only for clarity. Also note that the concentration of Puerto Rican ancestors in Hawaii probably reflects their arrival there in the early 1900s65.

 

How does the science work?

We first created a network of genetically-identified relationships — based on DNA alone — among over 700,000 individuals who consented to research.  Using network analysis techniques, we identified clusters of individuals in the network: groups of individuals who are slightly more related to one another than to individuals outside their cluster. In other words, from genetic data we identified novel “population structure” – subtly different groups of individuals within a larger population.

Having such a large genetic dataset allowed us to uncover these clusters, or communities, that would have not otherwise been possible.

We then added context to these clusters of genetic communities with family tree data to understand the origins of these groups of people, and to uncover the groups’ migration patterns and ancestries. From this we uncovered, in great detail, the historical explanations for the patterns observed in the genetics.

For example, certain groups of individuals corresponded to descendants of Scandinavian or French Canadian immigrants to North America, and we even identified groups of descendants of settlers such as the individuals with ancestry in the Appalachians and in New Mexico who experienced geographic or cultural isolation within the US. The data also depicted movements and settlements across east-west and north-south gradients within the United States – and remarkably matches known history.

Caption: Figure 4 | Genealogical data by generation trace migration of French Canadians (magenta) to the US and origins of Cajuns/Acadians in Atlantic Canada (blue). Map locations are plotted if OR > 10 within the indicated range of pedigree generations (date ranges give the 5th and 95th percentiles of birth year annotations). Points are scaled by number of pedigree annotations, separately for each of the 6 maps. Note that not all current political borders are shown. See Fig. 2 for more details.

 

What does this research mean for me?

This research has exciting implications for current and potential future customers of AncestryDNA. Recall that this research identified clusters, or genetic communities, of individuals, as well as their histories – where their ancestors may have lived, where they migrated to and from, what were their last names, and more. Inversely, that means that we can identify the genetic communities that an AncestryDNA customer belongs to. That in turn means that we can use an individual’s DNA to provide them with an extremely detailed historical portrait of the lives of some of their recent ancestors – more recent than previously possible. For example, we could tell someone where some of their ancestors might have lived and moved throughout their life, as well as potential historical reasons for those migrations, during the last several hundred years.

This work was made possible by the contributions thousands of customers who have researched their family trees, taken the DNA test, and agreed to participate in scientific research. In the coming months, we’re excited to share these findings with each of you in a personalized experience.