Tech Roots » Science http://blogs.ancestry.com/techroots Ancestry.com Tech Roots Blogs Wed, 19 Nov 2014 23:53:37 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.2 2 Talks and 4 Posters in 4 Days at the ASHG Annual Meetinghttp://blogs.ancestry.com/techroots/2-talks-and-4-posters-in-4-days-at-the-ashg-annual-meeting/ http://blogs.ancestry.com/techroots/2-talks-and-4-posters-in-4-days-at-the-ashg-annual-meeting/#comments Wed, 15 Oct 2014 20:11:47 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=2865 For the AncestryDNA science team, October brings more than fall foliage and pumpkins.  It also brings us the yearly meeting of the American Society of Human Genetics (ASHG), the main conference of the year in our field. On Saturday, we’ll arrive in San Diego to join thousands of other scientists for a four day conference… Read more

The post 2 Talks and 4 Posters in 4 Days at the ASHG Annual Meeting appeared first on Tech Roots.

]]>
For the AncestryDNA science team, October brings more than fall foliage and pumpkins.  It also brings us the yearly meeting of the American Society of Human Genetics (ASHG), the main conference of the year in our field.

On Saturday, we’ll arrive in San Diego to join thousands of other scientists for a four day conference to discuss topics in genetics, exchange ideas with colleagues, listen to talks and presentations – and importantly, to give some presentations of our own.

We’re always on the lookout for ways that we can translate the latest scientific findings into future features for AncestryDNA customers.  The ASHG Annual Meeting is a chance for all of us to soak up the newest advancements in human genetics.

This year, the number and variety of presentations that we are giving at ASHG attests to the fact that AncestryDNA, too, plays a role in these advancements.

This year, we’re proud to be giving two platform presentations – only 8% of applications for platform presentations at ASHG were accepted. Keith Noto will be giving a platform talk entitled “Underdog: A Fully-Supervised Phasing Algorithm that Learns from Hundreds of Thousands of Samples and Phases in Minutes,” discussing the workings behind an impressive algorithm we’ve developed to phase genotype data extremely quickly and accurately. Yong Wang’s platform talk will reveal a few fascinating discoveries about U.S. population history from studying patterns of ethnicity and identity-by-descent among AncestryDNA customers.

We’ll also be giving a number of poster presentations.  Mathew Barber will be presenting the method behind another algorithm that we’ve developed to better identify true identical-by-descent DNA matches.  I’ll be presenting a method we’ve developed to reconstruct the genomes of ancestors from genotype data of their descendants.  Jake Byrnes will be presenting a poster with a collaborator from Stanford University about inferring sub-continental local genomic ancestry. Finally, Eunjung Han and Peter Carbonetto will each present results from previous research they conducted at the University of California, Los Angeles and the University of Chicago, respectively.

We’re looking forward to engaging in insightful dialogue about our work with the scientific community. Even if we won’t see much fall foliage in San Diego.

The post 2 Talks and 4 Posters in 4 Days at the ASHG Annual Meeting appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/2-talks-and-4-posters-in-4-days-at-the-ashg-annual-meeting/feed/ 0
The DNA matching research and development life cyclehttp://blogs.ancestry.com/techroots/the-dna-matching-research-and-development-life-cycle/ http://blogs.ancestry.com/techroots/the-dna-matching-research-and-development-life-cycle/#comments Tue, 19 Aug 2014 20:24:30 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=2672 Research into matching patterns of over a half-million AncestryDNA members translates into new DNA matching discoveries  Among over 500,000 AncestryDNA customers, more than 35 million 4th cousin relationships have been identified – a number that continues to grow rapidly at an exponential rate.  While that means millions of opportunities for personal discoveries by AncestryDNA members,… Read more

The post The DNA matching research and development life cycle appeared first on Tech Roots.

]]>
Research into matching patterns of over a half-million AncestryDNA members translates into new DNA matching discoveries 

Among over 500,000 AncestryDNA customers, more than 35 million 4th cousin relationships have been identified – a number that continues to grow rapidly at an exponential rate.  While that means millions of opportunities for personal discoveries by AncestryDNA members, it also means a lot of data that the AncestryDNA science team can put back into research and development for DNA matching.

At the Institute for Genetic Genealogy Annual Conference in Washington, D.C. this past weekend, I spoke about some of the AncestryDNA science team’s latest exciting discoveries – made by carefully studying patterns of DNA matches in a 500,000-member database.

 

Graph showing growth in the number of 4th cousin matches between pairs of AncestryDNA customers over time

Graph showing growth in the number of 4th cousin matches between pairs of AncestryDNA customers over time

DNA matching means identifying pairs of individuals whose genetics suggest that they are related through a recent common ancestor. But DNA matching is an evolving science.  By analyzing the results from our current method for DNA matching, we have learned how we might be able to improve upon it for the future.

 

Life cycle of AncestryDNA matching research and development

Life cycle of AncestryDNA matching research and development

The science team targeted our research of the DNA matching data so that we could obtain insight into two specific steps of the DNA matching procedure.

Remember that a person gets half of their DNA from each of their parents – one full copy from their mother and one from their father.  The problem is that your genetic data doesn’t tell us which parts of your DNA you inherited from the same parent.  The first step of DNA matching is called phasing, and determines the strings of DNA letters that a person inherited from each of their parents.  In other words, phasing distinguishes the two separate copies of a person’s genome.

 

Observed genetic data only reveals the pairs of letters that a person has at a particular genetic marker.  Phasing determines which strings of letters of DNA were inherited as a unit from each of their parents.

Observed genetic data only reveals the pairs of letters that a person has at a particular genetic marker. Phasing determines which strings of letters of DNA were inherited as a unit from each of their parents.

If we had DNA from everyone’s parents, phasing someone’s DNA would be easy.  But unfortunately, we don’t.  So instead, phasing someone’s DNA is often based on a “reference” dataset of people in the world who are already phased.  Typically, those reference sets are rather small (around one thousand people).

Studies of customer data led us to find that we could incorporate data from hundreds of thousands of existing customers into our reference dataset.  The result?  Phasing that is more accurate, and faster.  Applying this new approach would mean a better setup for the next steps of DNA matching.

The second step in DNA matching is to look for pieces of DNA that are identical between individuals.  For genealogy research, we’re interested in DNA that’s identical because two people are related from a recent common ancestor.  This is called DNA that is identical by descent, or IBD.  IBD DNA is what leads to meaningful genealogical discoveries: allowing members to connect with cousins, find new ancestors, and collaborate on research.

But there other reasons why two people’s DNA could be identical. After all, the genomes of any two humans are 99.9% identical. Pieces of DNA could be identical between two people because they are both human, because they are of the same ethnicity, or because they share some other more ancient shared history.  We call these pieces of DNA only identical by state (IBS), because the DNA could be identical for a reason other than a recent common ancestor.

We sought to understand the causes of identical pieces of DNA between more than half a million AncestryDNA members.  Our in-depth study of these matches led us to find that in certain places of the genome, thousands of people were being estimated to have DNA that was identical to one another.

What we found is that thousands of people all having matching DNA isn’t a signal of all of them being closely related to one another.  Instead, it’s likely a hallmark of a more ancient shared history between those thousands of individuals – or IBS.

 

Finding places in the genome where thousands of people all have identical DNA is likely a hallmark of IBS, but not IBD.

Finding places in the genome where thousands of people all have identical DNA is likely a hallmark of IBS, but not IBD.

In other words, our analysis revealed that in a few cases where we thought people’s DNA was identical by descent, it was actually identical by state.  These striking matching patterns were only apparent after viewing the massive amount of matching data that we did.

So while the data suggested that our algorithms had room for improvement, that same data gave us the solution.  After exploring a large number of potential fixes and alternative algorithms, we discovered that the best way to address the problem was to use the observed DNA matches to determine which were meaningful for genealogy (IBD) – and distinguish them from those due to more ancient shared history.  In other words, the matching data itself has the power to help us tease apart the matches that we want to keep from those that we want to throw away.

The AncestryDNA science team’s efforts – poring through mounds and mounds of DNA matches – have paid off.  From preliminary testing, it appears that these latest discoveries relating to both steps of DNA matching may lead to dramatic DNA matching improvements. In the future, this may translate to a higher-quality list of matches for each AncestryDNA member: fewer false matches, and a few new matches too.

In addition to the hard work of the AncestryDNA science team, the huge amount of DNA matching data from over a half-million AncestryDNA members is what has enabled these new discoveries.  Carefully studying the results from our existing matching algorithms has now allowed us to complete the research and development “life cycle” of DNA matching: translating real data into future advancements in the AncestryDNA experience.

The post The DNA matching research and development life cycle appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/the-dna-matching-research-and-development-life-cycle/feed/ 6
Ancestry.com to Present Jermline on DNA Day at the Global Big Data Conferencehttp://blogs.ancestry.com/techroots/jeremy-pollack-to-present-jermline-at-the-big-data-innovation-summit-on-april-10th/ http://blogs.ancestry.com/techroots/jeremy-pollack-to-present-jermline-at-the-big-data-innovation-summit-on-april-10th/#comments Wed, 09 Apr 2014 22:57:40 +0000 Jeremy Pollack http://blogs.ancestry.com/techroots/?p=2292 Interested in genealogy?  Curious about DNA?  Fascinated by the world of big data?  If so, come check out my talk  at the Global Big Data Conference on DNA day this Friday, April 25 at 4pmPT in the Santa Clara Convention Center!  I’ll cover Jermline, our massively-scalable DNA matching application.  I’ll talk about our business, give a run-through… Read more

The post Ancestry.com to Present Jermline on DNA Day at the Global Big Data Conference appeared first on Tech Roots.

]]>
Interested in genealogy?  Curious about DNA?  Fascinated by the world of big data?  If so, come check out my talk  at the Global Big Data Conference on DNA day this Friday, April 25 at 4pmPT in the Santa Clara Convention Center!  I’ll cover Jermline, our massively-scalable DNA matching application.  I’ll talk about our business, give a run-through of the matching algorithm, and even throw in a few Game of Thrones jokes.  It’ll be fun!  Hope to see you there.

 

Update: Thanks to everyone that  attended my presentation! You can find the slides on the Ancestry.com Slideshare account for your reference.

Match list

The post Ancestry.com to Present Jermline on DNA Day at the Global Big Data Conference appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/jeremy-pollack-to-present-jermline-at-the-big-data-innovation-summit-on-april-10th/feed/ 0
AncestryDNA Regions by the Numbershttp://blogs.ancestry.com/techroots/ancestrydna-did-you-know/ http://blogs.ancestry.com/techroots/ancestrydna-did-you-know/#comments Tue, 25 Mar 2014 22:37:20 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=2134 Since May of 2012, when we first released AncestryDNA, we’ve returned results to over a quarter of a million customers. Based on feedback that we have received, those 300,000 customers have learned a great deal about their family history – their deep ancestral origins and their genetic relatives. As it turns out, AncestryDNA has also… Read more

The post AncestryDNA Regions by the Numbers appeared first on Tech Roots.

]]>
Since May of 2012, when we first released AncestryDNA, we’ve returned results to over a quarter of a million customers.

Based on feedback that we have received, those 300,000 customers have learned a great deal about their family history – their deep ancestral origins and their genetic relatives.

As it turns out, AncestryDNA has also learned a great deal from our customers.  We’ve uncovered some interesting statistics about ethnicity estimates that may help you to learn a bit more about your own family history – and we’ll share them with you in this blog post.

At AncestryDNA, we estimate a customer’s genetic ethnicity as a set of percentages in 26 regions around the world. See map a map of these regions below.

Ethnicity-all-regions-map

We estimate the amount of DNA that a customer likely inherited from each of these regions by comparing a customer’s DNA with a reference set of DNA samples – with corresponding documented family trees – from each of these regions. For a deeper dive into the science of ethnicity estimation, take a look at my previous blog post on the subject.

Below is an example of an AncestryDNA ethnicity estimate.  In this post, we’ll explore what AncestryDNA ethnicity estimates look like across all of our customers – specifically, how many of these 26 regions show up in someone’s estimate?

Ethnicity example

Based on the percentages estimated for a customer, we place each region into one of three categories.  Main Regions are the primary regions from which you likely inherited DNA (the regions, pictured above, that you see when you first view your ethnicity estimate); Trace Regions have less evidence of being part of your genetic ethnicity (and are viewed by clicking on the “+” button); Other Regions Tested have even less or no evidence, and do not show up as part of your ethnicity estimate.

In exploring the aggregated genetic ethnicity results of customers who opted in to scientific research, here are a few fun facts we’ve found about the diversity of regions found in customers’ estimates:

  • Ethnicity at a continental level – First, it’s interesting to view a person’s ethnicity estimate by continent. Our 26 regions can be broken into six different continental regions – such as Africa, Europe, and West Asia (see the estimate above). On average, we see that customers can trace their DNA back to 2.3 different continents.  While half of our customers have 2 continents or more as part of their ethnicity estimate, some have only one continent — and others have all six!
  • Main Regions in an ethnicity estimate – According to U.S. Census data on census.gov, “the overwhelming majority (97 percent) of the total U.S. population reported only one race in 2010. This group totaled 299.7 million. Of these, the largest group reported white alone (223.6 million), accounting for 72 percent of all people living in the United States.”  This is thought-provoking because while most Americans self-identify with only one ethnicity, our database shows that some customers can be linked to as many as 11 main regions (or ethnicities), and the average is nearly four regions!  See a graph representing the number of main ethnicity regions per customer, here.  A person’s ethnicity is likely far more nuanced than they may report on a census.
  • Expanding to include Trace Regions – While main regions are those with strong evidence that they are part of someone’s genetic ethnicity, trace regions are those that have a smaller amount of evidence (and that you must click on the “+” sign to view). When we count up regions in both of these categories, customers can be traced back, on average, to 8.5 different regional ethnicities.  This really affirms that our customers hail from a variety of cultures and regions across the world.  Some customers even have 24 out of the possible 26 regions as part of their estimate!
  • African regions – We made an exciting new finding recently that African Americans have on average more than three African regions in their estimates on average. This shows that African Americans too are a melting pot of many unique African ethnicities. 

These statistics and averages demonstrate the diversity of regions often found in an AncestryDNA customer’s ethnicity estimate — and prove that Americans are truly a mix cultures and influences from across the globe.

Advances in science and DNA research are just now beginning to make a significant impact on how we understand ourselves and society at large. While DNA testing often confirms the expected, it can also reveal the completely unexpected. How do your AncestryDNA results compare to our findings?

The post AncestryDNA Regions by the Numbers appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/ancestrydna-did-you-know/feed/ 10
DNA and the Masses: The Science and Technology Behind Discovering Who You Really Arehttp://blogs.ancestry.com/techroots/dna-and-the-masses-the-science-and-technology-behind-discovering-who-you-really-are/ http://blogs.ancestry.com/techroots/dna-and-the-masses-the-science-and-technology-behind-discovering-who-you-really-are/#comments Wed, 12 Mar 2014 19:02:58 +0000 Melissa Garrett http://blogs.ancestry.com/techroots/?p=2075 Originally published on Wired Innovation Insights, 3-12-14. There is a growing interest among mainstream consumers to learn more about who they are and where they came from. The good news is that DNA tests are no longer reserved for large medical research teams or plot lines in CSI. Now, the popularity of direct-to-consumer (DTC) DNA tests… Read more

The post DNA and the Masses: The Science and Technology Behind Discovering Who You Really Are appeared first on Tech Roots.

]]>
Originally published on Wired Innovation Insights, 3-12-14.

There is a growing interest among mainstream consumers to learn more about who they are and where they came from. The good news is that DNA tests are no longer reserved for large medical research teams or plot lines in CSI. Now, the popularity of direct-to-consumer (DTC) DNA tests is making self-discovery a reality, and is leading individuals to learn more about their genetic ethnicity and family history. My personal journey has led to discoveries about my family history outside of the United States. On a census questionnaire I am White or maybe Hispanic. My genetics, however, show I am Southern European, Middle Eastern, Native American, Northern African, and West African. And who knew that DNA would connect me with several cousins that have family living just 20 miles of where my mom was born in central Cuba?

Major strides have been made in recent years to better understand and more efficiently analyze DNA. Where are we today?

  • Easier: DNA testing required a blood draw. Now, you can spit in a tube in the comfort (and privacy) of your own home.
  • Cheaper: In 2000, it took about 15 years and $3 billion to sequence the genome of one person. Today you could get your genome sequenced for a few thousand dollars. To put that into context, if a tank of gas could get you from New York to Boston in 2000, and fuel efficiency had improved at the same pace as DNA sequencing, today you could travel to Mars (the planet) and back on the same tank of gas.
  • Faster: Companies of all kinds are quickly innovating to keep up with demand and to make DNA testing more readily available and affordable. Illumina recently announced a whole-genome sequencing machine that could sequence 20,000 entire genomes per year.
  • More information: We can now tell you things about your ethnicity, find distant cousins, tell you whether a drug is likely to benefit or harm you, and tell your risk of diseases like breast and colon cancer.

It isn’t all roses. There is a joke among the genetic community that you can get your DNA sequenced for $1,000, but it will cost $1,000,000 to interpret it. DNA is complex. Each of us contains six billion nucleotides that are arranged like letters in a book that tell a unique story. And while scientists have deciphered the alphabet that makes up the billions of letters of our genome, we know woefully little about its vocabulary, grammar and syntax. The problem is that if you want to learn how to read, you need books, lots of them, and up until recently we had very few books to learn from.

To illustrate how complex it can be, let’s look at how to determine a person’s genetic ethnic background. Say you are given three books written in English, Chinese and Arabic. Even if you don’t speak the languages you can use the letters in those books to determine what percent of a fourth book is written in each of the respective languages, since those three languages are so distinct. But that is like determining whether someone is African, White or Asian, which doesn’t require a genetic test. What if the three books were written in English, French and German that use a similar alphabet? That is like telling someone that is White that they are a mix of various ethnic groups. That is a much harder problem and one that usually requires a genetic test.

So how do we distinguish the different ethnicities using DNA? Since we don’t have a genetic dictionary that tells us what we are looking for, scientists use the genetic signatures of people who have a long history in a specific region, religion, language, or otherwise practiced a single culture as a dictionary. Once enough of those genetic sequences are gathered, teams of geneticists and statisticians use the dictionary to define what part of your genome came from similar regions.

How does big data play into all of this science?

DNA has been “big data” before the term became popularized. The real question should not be about how much data you have, but what you do with the data. Big data allows companies like Ancestry.com to compare 700,000 DNA letters for a single individual against the 700,000 DNA letters of several hundred thousand other test takers to find genetic cousins. That’s a lot of computational power, and the problem grows exponentially. To make all of this possible, big data and statistical analytics tools, such as Hadoop and HBase, are used to reduce the time associated with processing DNA results.

Given how far we have come in such a short time, what should we expect for the future of consumer DNA? The technology is moving so fast that it is almost worthless to predict. But what is clear is that we won’t come out of this genetic revolution the same. We are going to live better, healthier lives, and we are going to learn things about our species and ourselves we never dreamed of. And importantly, putting genetic ethnicity and family connection in the hands of individuals is going to tear down our notion of race and show how we are all family – literally. Maybe we’ll even treat each other a little better.

Ken Chahine is Senior Vice President and General Manager for Ancestry.com DNA.

 

The post DNA and the Masses: The Science and Technology Behind Discovering Who You Really Are appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/dna-and-the-masses-the-science-and-technology-behind-discovering-who-you-really-are/feed/ 0
Imagine Future Technology for Family History Simulationshttp://blogs.ancestry.com/techroots/imagine-future-technology-for-family-history-simulations/ http://blogs.ancestry.com/techroots/imagine-future-technology-for-family-history-simulations/#comments Tue, 19 Nov 2013 19:14:22 +0000 Lincoln Cannon http://blogs.ancestry.com/techroots/?p=1538 Ancestry.com is a technology company that knows family history – not just a family history company, and not even a family history company that just happens to use technology. Technology, and particularly computing, is essential to our mission to help everyone discover, preserve and share family history. Without it, we could still tell family stories… Read more

The post Imagine Future Technology for Family History Simulations appeared first on Tech Roots.

]]>
Ancestry.com is a technology company that knows family history – not just a family history company, and not even a family history company that just happens to use technology. Technology, and particularly computing, is essential to our mission to help everyone discover, preserve and share family history. Without it, we could still tell family stories to our children, but we certainly couldn’t substantiate those stories from 12 billion historical records into 55 million family trees through the work of 2.7 million subscribers, as Ancestry.com does today across all its websites.

In the 1960s, Intel co-founder Gordon Moore observed that the ratio of computing capacity to cost was doubling predictably, every couple years or faster. In other words, a computer built in 1969 had twice as much capacity as a computer built at the same cost in 1968, and over a hundred times as much capacity as a computer built at the same cost in 1962; a computer built in 1969 would also reliably have half the capacity of a computer built at the same cost in 1970, and less than a hundredth the capacity of a computer built at the same cost in 1976.

Moore's Law

By Courtesy of Ray Kurzweil and Kurzweil Technologies, Inc. (en:Image:PPTMooresLawai.jpg) [CC-BY-1.0 (http://creativecommons.org/licenses/by/1.0)], via Wikimedia Commons

That trend, known as Moore’s Law, has continued to the present. Today, a $150 smartphone can store about a million times more data and process that data about a thousand times faster than the $150K Apollo Guidance Computer that took astronauts to the moon in 1969. The smartphone also has wireless access to extended computing capacity on the Internet, including systems like Ancestry.com, which stores over 10 petabytes of data, and processes over 40 million searches daily.

Suppose Moore’s Law continues. Within decades, whatever replaces smartphones would have millions, billions and then trillions of times the overall computing capacity at the same cost. Within a century, $150 could purchase more computing capacity than that of all human brains combined. If that were to happen, what might the intersection of family history and technology look like? What might Ancestry.com look like? Of course we don’t really know, but let’s imagine.

Moore's Law Projected

By Coutesy of Ray Kurzweil and Kurzweil Technologies, Inc. (en:PPTExponentialGrowthof_Computing.jpg) [CC-BY-1.0 (http://creativecommons.org/licenses/by/1.0)], via Wikimedia Commons

One of the things we might do is tell stories about our family and ancestors at a much more massive scale and at a far deeper level, by computing highly detailed family history simulations. Maybe they would be something like a mix of Google Earth enhanced with a full history of maps derived from geological and astronomical research; Oculus Rift enhanced with brain-computer interfacing for an immersive tactile experience; and Second Life enhanced with avatars generated from family trees, photos, journals, and DNA, and abstracted to sub-neuronal degrees of detail to enable artificial intelligence. In deeper more meaningful ways, we could understand and even feel our family history, as the characters, settings, plots and conflicts unfold before us – as our stories come to life, and we walk in our ancestors’ shoes (literally?).

As it turns out, if ever we compute such family history simulations, detailed to the point of enabling the characters with fully immersive consciousness, there would be a rather shocking philosophical ramification – more on that next time I post.

The post Imagine Future Technology for Family History Simulations appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/imagine-future-technology-for-family-history-simulations/feed/ 0
Unraveling the Science Behind Ethnicity Estimationhttp://blogs.ancestry.com/techroots/unraveling-the-science-behind-ethnicity-estimation/ http://blogs.ancestry.com/techroots/unraveling-the-science-behind-ethnicity-estimation/#comments Thu, 24 Oct 2013 15:54:02 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=1390 A small tube of your saliva can reveal a lot about your family history hundreds and even thousands of years ago.  At AncestryDNA, we study the DNA in that saliva – using sophisticated science – to reveal your ethnic origins.  We recently announced an update to our ethnicity results which provides customers with a more… Read more

The post Unraveling the Science Behind Ethnicity Estimation appeared first on Tech Roots.

]]>
A small tube of your saliva can reveal a lot about your family history hundreds and even thousands of years ago.  At AncestryDNA, we study the DNA in that saliva – using sophisticated science – to reveal your ethnic origins.  We recently announced an update to our ethnicity results which provides customers with a more in-depth look at where their ancestors once lived.

How does the DNA in your saliva record your family history in the first place?

To understand how, we’ll turn to language, since there are quite a few parallels with genetics.

Language and Geography

You “inherit” your dialect, using similar phrases, sayings, and words as your parents and the people around you.  For example, there are a number of words people use to describe a “sweetened, carbonated beverage.”  The colors in the map below show how often people living in the U.S. use each of three particular words.

Pop soda coke

You can see some clear geographic patterns. Based on their term for soda (I’m a Northeasterner), coke-drinkers from the South cluster together, as do pop-drinking Midwesterners.

So if we met a person who called the sugary drink in their hand a “coke,” we could feel confident in guessing he was from the south.  If he used the word “pop,” he is probably from somewhere in the Midwest.

Back to DNA

When AncestryDNA estimates your genetic ethnicity, we use a similar approach – but instead of comparing your language patterns to those of other people, we’re comparing your DNA.

Just like certain regions of the U.S. appear different based on dialect, human groups can often be distinguished based on lots and lots of genetic data.  By finding the clusters of human groups to which you are similar, based on your DNA (rather than your dialect), we can estimate your genetic ethnicity.

Both DNA and language can help to trace someone’s origin, since both DNA and language are inherited.

But unlike language, which you can “inherit” from people around you, you only inherit DNA from your parents, who inherited their DNA from their parents, and so forth. Thus, our DNA is a mosaic of the DNA of our ancestors.  That DNA tells us about where our ancestors came from.

This is due to the fact that the variation in our DNA represents ancient and modern migrations of humans as we populated the globe.  As humans moved from Africa, to Europe, Asia, and the Americas settling new areas, groups split apart, taking with them their DNA.  By chance, the DNA of groups settling one area could be different than the DNA of those that settled in another.

Over time, individuals from a group of people usually had children with people from the same group.  In so doing, they passed their DNA to their children – generation after generation.  And if a group of people remained relatively isolated from other groups, there wouldn’t be much new DNA entering that group from others.  In this process, the DNA of human populations becomes slightly differentiated.

Going back to our analogy, southerners may have started to say “coke,” and in passing the word to their neighbors and kids, have continued to do so generation after generation.  Similarly, chance movements of humans across the world allow us to see DNA evidence of this history.

At AncestryDNA, we leverage the fact that the DNA of individuals from across the globe shows evidence of human population history.

We examine DNA samples of thousands of people from all over the world who have deep ancestry in a specific global location – for example, individuals whose grandparents were all born in SpainWe then cluster their DNA into 26 overlapping  worldwide regions based on DNA patterns observed between and within the regions.

More simply, we construct a DNA map, similar to a soda/pop/coke dialect map. Some DNA samples represent the Great Britain region, some represent East Asia, and others represent North Africa.  

Ethnicity

Then, we compare your DNA to these individuals to identify from which of the 26 regions you are likely to have ancestry.  When you have DNA that is similar to the DNA of people with deep ancestry in a specific location, you very likely also had ancestors from that same place.  Similar to the linguistics map, we have a good idea of where you might be from if we hear you say “pop.”

In the most recent update to AncestryDNA ethnicity results, we have increased the number of individuals to whom we compare you as well as the amount of your DNA used in the comparison – allowing us get even more specific in certain regions.  This gives us a highly refined estimate of your genetic ethnicity.

It’s important to note that DNA differences between human groups are subtle: the DNA sequences of two random people are on average 99.9% identical.  But, that still means that two random individuals differ at about 3 million DNA positions.  This makes for an often difficult, but exciting challenge in determining ethnicity.

Interpreting your genetic ethnicity

There are a few other important parallels and differences between the linguistics example and a genetic ethnicity estimate.

Let’s say you currently live in the Midwest, but since your parents grew up in the Northeast, you use the word “soda.”  While you identify as a Midwesterner, your dialect might indicate that you’re a Northeasterner instead – like your parents.

Similarly, your genetic ethnicity estimate tells you about your historical origins, not about where you live today.  AncestryDNA estimates go back hundreds to a thousand years, when “populations” and their boundaries were very different than those we know today. This might cause you to have a different genetic ethnicity estimate than you might expect.

But while an individual’s dialect may change when he or she moves to a new location, an individual’s DNA doesn’t.  This also affects your genetic ethnicity.  For instance, if the ancestors of your Italian ancestors migrated from Eastern Europe hundreds of years ago, you might show up as having Eastern European ethnicity instead of Italian.

Pop soda coke

Take one final look at the linguistic map and notice that there are areas that appear to be a mix of others.  For instance, in Oklahoma, people use a combination of “pop” and “coke,” influenced by the regions around them.  This means that it would be difficult to identify someone specifically as an “Oklahoman.”

The genetics of human populations can be similarly affected by migrations between neighboring groups.  This makes it harder to disentangle genetic ethnicity from some regions, like Western Europe, where people and borders have moved quite a bit in the past thousand years.

All of this – estimating someone’s ethnicity from genetics – involves cutting edge science.  By looking at more data, developing novel methodologies, and discovering new patterns in our DNA, we continue to advance AncestryDNA.

That means that the AncestryDNA science team will be up late, drinking pop, soda, coke, and, according to the British scientist on our team, fizzy drink.

The post Unraveling the Science Behind Ethnicity Estimation appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/unraveling-the-science-behind-ethnicity-estimation/feed/ 97
AncestryDNA: part of the scientific communityhttp://blogs.ancestry.com/techroots/ancestrydna-part-of-the-scientific-community/ http://blogs.ancestry.com/techroots/ancestrydna-part-of-the-scientific-community/#comments Wed, 16 Oct 2013 21:33:19 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=1356 Next week, the AncestryDNA science team will be flying across the country with a tube full of posters. Scientific posters, that is.  We’ll be presenting them at the annual American Society of Human Genetics conference (ASHG) in Boston.  This will mark AncestryDNA’s second year presenting our latest research at the largest worldwide conference in human… Read more

The post AncestryDNA: part of the scientific community appeared first on Tech Roots.

]]>
Next week, the AncestryDNA science team will be flying across the country with a tube full of posters.

Scientific posters, that is.  We’ll be presenting them at the annual American Society of Human Genetics conference (ASHG) in Boston.  This will mark AncestryDNA’s second year presenting our latest research at the largest worldwide conference in human genetics.

Over 6,000 researchers are projected to be at the conference – from academia and industry alike.  Over five days, the science team will be listening to scientific talks, discussing our research with other scientists, and staying abreast of the newest and coolest topics in the field.

At AncestryDNA, we strongly believe in being highly involved in the scientific community.

Discussions with other scientists can lead to eureka moments and plant the seeds for novel research ideas and possibilities.  By engaging with other scientists, we can get feedback on our current endeavors at AncestryDNA – ensuring that we are incorporating the latest developments in population genetics into our own research.

But it’s a two-way street.  Good science requires give and take and the exchange of ideas and criticism.  We too will share our experiences and knowledge.  Just as AncestryDNA learns from other scientists, much of our research can inform the future research of other human geneticists. In some cases, we’ll be collaborating with other scientists to do the research together.

Most importantly, we will maintain an ongoing rapport and relationships with other scientists from academia and industry. As a community, we can together continue to advance our knowledge about human genetics and how it relates to family history.

The ASHG conference is just one of many opportunities for these important interactions with the scientific community.  Throughout the year, through other conferences, guest lectures at nearby universities, discussions with our scientific advisory board, and research collaborations, we’re keeping AncestryDNA’s science fresh and of the highest caliber.

We’re excited for a week of genetics!

The post AncestryDNA: part of the scientific community appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/ancestrydna-part-of-the-scientific-community/feed/ 0
Ancestry.com Employee Honored with Women Tech Awardhttp://blogs.ancestry.com/techroots/ancestry-com-employee-honored-with-women-tech-award/ http://blogs.ancestry.com/techroots/ancestry-com-employee-honored-with-women-tech-award/#comments Thu, 03 Oct 2013 22:44:21 +0000 Melissa Garrett http://blogs.ancestry.com/techroots/?p=1266 Recently, Catherine Ball, VP of Genomics and Bioinformatics for AncestryDNA was announced as a winner for the Women Tech Awards, presented by the Women Tech Council, under the Trailblazer category. The award recognizes technology-focused women who are driving innovation, influencing technology companies, and are passionate about the community.  Other award winners came from companies such… Read more

The post Ancestry.com Employee Honored with Women Tech Award appeared first on Tech Roots.

]]>
Recently, Catherine Ball, VP of Genomics and Bioinformatics for AncestryDNA was announced as a winner for the Women Tech Awards, presented by the Women Tech Council, under the Trailblazer category. The award recognizes technology-focused women who are driving innovation, influencing technology companies, and are passionate about the community.  Other award winners came from companies such as ATK Aerospace Group, Domo, and eBay.

Cathy is a truly remarkable woman. For almost two decades, she has worked as a genomic scientist to help physicians, citizens and other scientists get the most out of genome data. From analyzing data related to large-scale biomedical experiments – well before the current “Big Data” trend – to collaborating on the annotation of the first eukaryotic genome (brewer’s yeast), Cathy has positioned herself and her team on the cutting edge of technology and science. Over the course of her career, Cathy has authored scores of scientific publications, organized several scientific conferences, given dozens of guest lectures, and reviewed hundreds of federal grant proposals and scientific manuscripts that have been key to shedding further light on diverse research topics.

Cathy’s most recent efforts include leading a team of population geneticists, statisticians, and computer scientists to create the analytical approaches behind the AncestryDNA direct-to-consumer genotyping services.

She was born and raised in a small beachside town on the island of Oahu in Hawaii.  Exploring tide pools, streams and rain forests helped spark Cathy’s interest in biology; and living in such a multicultural community provided insights into the ways a person’s life can be affected by family history.

Cathy applies the scientific method to everything she does, which is why this award and The Women Tech Council truly embody Cathy’s desire for each of us to stay curious, be honest, have a sense of humor and search out ways to help those around you grow.

Thanks to the Women Tech Council for providing recognition to women that are driving innovation and influencing technology companies. The Women Tech Council’s mission to provide leadership, resources and mentoring for women, while maintaining a strong bond with the business community has been pivotal to developing top technology talent.

Cathy photo 2

The post Ancestry.com Employee Honored with Women Tech Award appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/ancestry-com-employee-honored-with-women-tech-award/feed/ 0
AncestryDNA Makes Scientific Breakthrough in West African Ethnicityhttp://blogs.ancestry.com/techroots/ancestrydna-makes-scientific-breakthrough-in-west-african-ethnicity/ http://blogs.ancestry.com/techroots/ancestrydna-makes-scientific-breakthrough-in-west-african-ethnicity/#comments Thu, 12 Sep 2013 16:54:27 +0000 Julie Granka http://blogs.ancestry.com/techroots/?p=1130 The AncestryDNA science team presented the results of their latest research today at the Smithsonian Institute’s symposium on The African Diaspora in Washington D.C. Using unique proprietary DNA samples and a variety of statistical approaches, our science team has been able to separate West Africa into six separate population groups based on genetic data.  This… Read more

The post AncestryDNA Makes Scientific Breakthrough in West African Ethnicity appeared first on Tech Roots.

]]>
The AncestryDNA science team presented the results of their latest research today at the Smithsonian Institute’s symposium on The African Diaspora in Washington D.C. Using unique proprietary DNA samples and a variety of statistical approaches, our science team has been able to separate West Africa into six separate population groups based on genetic data.  This advancement will provide a finer-resolution genetic ethnicity estimate for individuals with West African ancestry.

West African ethnicity

AncestryDNA’s six new ethnicity regions of West Africa include Senegal, Mali, Ivory Coast/Ghana, Benin/Togo, Nigeria, and Cameroon/Congo, each of which has a distinct set of tribal affiliations.  The division of West Africa into these groups marks the first time that West African genetic ethnicity estimates can achieve this level of detail, bringing AncestryDNA’s total number of reported genetic ethnicity regions in Africa to ten.

The announcement of the new genetic ethnicity regions were presented at The African Diaspora event earlier today by Dr. Jake Byrnes, population genomics senior analyst on the AncestryDNA science team.  Although these new ethnicity updates will not be made available to all AncestryDNA users for a few more months, we wanted to give the inside scoop on Jake’s Smithsonian presentation detailing the West African ethnicity update as well as additional research findings on the genetics of African Americans.

It can be extremely difficult to research one’s African ancestry using historical records alone, as most African American individuals in the U.S. are unable to find detailed records of their ancestors before the 1870s. Our AncestryDNA test can help family historians use genetics to pick up where the paper trail ends.

AncestryDNA leverages a unique proprietary collection of DNA samples from individuals with well-documented family trees to conduct innovative research in population genetics, human evolution, and migration. The science behind AncestryDNA is continually evolving and improving. During this ongoing process, the science team demonstrated that genetic data reliably shows population structure in Western Africa.  What this means is that the DNA of individuals from Western Africa clusters into a number of distinct groups. As a result, AncestryDNA can now more finely define genetic ethnicity regions in Western Africa. (See the visual representations below.)

 Caption: The graph on the left depicts the distinct genetic clusters of individuals from West Africa. Each point is an individual with deep ancestry in West Africa from our proprietary sample database. The color of each point corresponds to the country (shown in the map on the right) where a majority of that individual’s ancestors lived. The x and y axes indicate two primary axes of genetic differentiation (called principal components, or PCs) as inferred from sample DNA. Points closer together on the plot are more similar genetically. Comparison of the graph on the left and the map on the right reveals the similarity of the genetic and geographic structure.


Caption: The graph on the left depicts the distinct genetic clusters of individuals from West Africa. Each point is an individual with deep ancestry in West Africa from our proprietary sample database. The color of each point corresponds to the country (shown in the map on the right) where a majority of that individual’s ancestors lived. The x and y axes indicate two primary axes of genetic differentiation (called principal components, or PCs) as inferred from sample DNA. Points closer together on the plot are more similar genetically. Comparison of the graph on the left and the map on the right reveals the similarity of the genetic and geographic structure.

Population structure such as this is not new, and even exists in the U.S today.  Here’s an example from the 2010 census data.  Each point is an individual, colored by their self-reported ethnicity.

West African ethnicity 3

You’ll notice that people of similar backgrounds tend to stay and live in the same general geographic areas.  Imagine now if we could roll this map back in time to see where an individual’s ancestors immigrated to the U.S.!

The AncestryDNA science team is looking toward a future where we could reveal, in the absence of a family tree, the most probable locations where one’s ancestors lived – both in the U.S. and abroad.  To do this, the science team hopes to harness the power of collectively analyzing family trees of individuals with similar genetic profiles.

Though this project is still in its infancy, the science team has made some progress. First, we looked at the birth locations of individuals in the trees of all African Americans. Then, we looked for locations where, relative to all African Americans, there appeared to be an over-representation of birth locations in trees of individuals with a particular West African ancestry.  For individuals with Senegalese genetic ethnicity, we found what seems to be an over-representation of birth locations in South Carolina and Georgia in the 1700’s and 1800’s.

This might be an example where the genetics matches up with history.  In the 18th century, plantation owners in South Carolina and Georgia knew little about rice cultivation and preferred to import slaves from Sierra Leone, Gambia, and Senegal (the Windward Coast), where rice is a commonly grown crop. It is thought by some scholars that the Gullah people, who today live in coastal Georgia and South Carolina, descend from slaves imported from the Windward Coast to work specifically on rice plantations.

Providing more detailed ethnicity estimates for West African populations is crucial for American family historians.  Approximately 85-90% of today’s African Americans are descendants of enslaved Africans brought to America between 150 and 450 years ago – leaving many African Americans without a known family history prior to this time. AncestryDNA’s new West African ethnicity update will help to link African American individuals to specific locations in West Africa. In the future, more detailed analyses of genetic data and family trees have the potential to reveal important historical stories.

West African ethnicity 4

Thanks to the science team’s findings of genetic structure in West Africa, the new African ethnicity regions will be a breakthrough for many African Americans and may even reunite the origins of disrupted families.  But more is to come, as we are only scratching the surface of what is possible.

The post AncestryDNA Makes Scientific Breakthrough in West African Ethnicity appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/ancestrydna-makes-scientific-breakthrough-in-west-african-ethnicity/feed/ 24