Posted by Julie Granka on June 8, 2015 in DNA Tech, Science

When someone takes an AncestryDNA test, we compare their DNA to the DNA of the hundreds of thousands of other test-takers in the AncestryDNA database.  We’re looking for “DNA matches” — people who share DNA with one another, and so might be relatives.

The main idea behind identifying a DNA match is to look for pieces of DNA that two people both have because they each inherited it from a recent common ancestor.  In a previous blog post about last year’s update to DNA matching, we detailed the steps we take to turn one’s genetic data, and that of others in the AncestryDNA database, into these suggested DNA matches.

One of those steps is to identify pieces, or segments, of DNA that are likely to be identical between pairs of people.  But if two people have identical DNA, it doesn’t necessarily mean that they inherited it from a recent shared ancestor.  Pieces of DNA could be identical between two people because they are of the same ethnicity or population — meaning that they (and many others from that same population) share DNA that they inherited from a distant ancestor who lived much longer ago.

So in order to find DNA matches that are due to recent ancestors, we need a filtering step.  At AncestryDNA, we use an algorithm developed by the science team called Timber. Its basic idea is that if two people appear to have identical DNA at a particular place in the genome, but they also appear to have identical DNA with thousands of other people at that particular place, then the shared DNA between the two people was probably inherited from a more distant ancestor (or a distant set of ancestors).  In other words, that segment is probably not relevant to a common ancestor within the last 6 or 10 generations.

So in cases like this, Timber might filter out those identical pieces of DNA entirely – and not consider them when deciding whether two people are related. Looking at results from over 300,000 people, we’ve found that while Timber filters out a majority of small identical segments (< 8 cM), it often also filters out larger segments (even those over 15 cM).  See the chart below.

After finding what appear to be identical segments of DNA between pairs of people, Timber will filter out those that do not appear to have been inherited from a recent common ancestor.  For identical segments of the indicated sizes, bars show the proportion filtered out by Timber in a set of over 300,000 individuals. (Note: CentiMorgans (cM) are a unit of distance in the human genome.)
After finding what appear to be identical segments of DNA between pairs of people, Timber will filter out those that do not appear to have been inherited from a recent common ancestor. For identical segments of the indicated sizes, bars show the proportion filtered out by Timber in a set of over 300,000 individuals. (Note: CentiMorgans (cM) are a unit of distance in the human genome.)

What this chart clearly shows is that when trying to identify DNA matches who share DNA from a recent common ancestor, there is much more to it than the sizes of the identical DNA segments. Longer identical segments don’t necessarily prove that two people have a recent common ancestor.  DNA matching among hundreds of thousands of people has shown that even long identical segments can indicate shared ancestry, shared population history, or a more distant shared ancestor.

The good news is that by using a filter like Timber, we can find shared DNA that is more likely to be due to recent common ancestors.  It’s important to keep in mind that identifying identical segments and filtering them are both statistical decisions; so, it is difficult to base conclusions on any particular shared segment of DNA alone.  But in aggregate, a large number of experiments on data from real families as well as computationally engineered ones have shown that Timber is exceptionally powerful at removing DNA matches that are due to more distant relationships.

Since Timber is powered by DNA matching results among hundreds of thousands of people, Timber is a personalized filter that is uniquely possible with AncestryDNA’s enormous database.  And as a result, AncestryDNA test-takers can receive DNA matches that are more reliable for genealogy research.

 

Julie Granka

Julie has been a population geneticist at AncestryDNA since May 2013. Before that, Julie received her Ph.D. in Biology and M.S. in Statistics from Stanford University, where she studied genetic data from human populations and developed computational tools to answer questions about population history and evolution. She also spent time collecting and studying DNA using spit-collection tubes like the ones in an AncestryDNA kit. Julie likes to spend her non-computer time enjoying the outdoors – hiking, biking, running, swimming, camping, and picnicking. But if she’s inside, she’s baking, drawing, and painting.

Comments

  1. Annette Kapple

    Timber eliminated my mother’s 3rd cousin. I’m not crazy about it. I’m not finding as many cousin matches as I did before Timber. I realize I was getting some false positive results, but some good matches were lost. Would like to see an improved filter.

    • Susan Gates

      I have a new-found half brother who also submitted DNA to ancestry. I was hoping that my test would confirm our relationship, but it hasn’t. It appears that Timber could be the culprit. Can this be checked by other methods? How infallible is Ancestry’s DNA testing?

  2. Shannon Christmas

    Define the terms “recent common ancestor” and “more distant common ancestor.” Without an in-depth discussion and review of the underlying data, this post touting the value of AncestryDNA’s Timber algorithm remains unpersuasive. As discussed in the first comment, the overly aggressive Timber has eliminated and miscategorized many AncestryDNA customers’ matches, generating false negatives and making more closely related matches much more difficult to locate on our match lists. Timber appears to struggle with determining which segments indicate a traceable common ancestor and which are simply artifacts of shared biogeographical ancestry. While customers complain about the abundance of false negatives and the hit-or-miss relationship predictions, AncestryDNA continues take a self-congratulatory tone. Customers can appreciate AncestryDNA’s attempt to minimize false positive matches (a phenomenon that should never have existed in the first place), the effort has clearly overshot its goal. Meanwhile, AncestryDNA has yet to equip their genetic genealogy product with essential analytic tools (matching DNA segment data and a chromosome browser) or deliver the level of transparency required to conduct autosomal DNA genealogy. This is unacceptable, unethical, and unconscionable. Given the state of AncestryDNA, the company’s leadership and science staff ought to expend less energy and fewer resources applauding themselves and, instead, produce a product worthy of the current price. The status quo fails to impress.

  3. Barbara B

    I’m glad to see a discussion of Timber. Maybe you can throw some light on my experience. I had two matches at 18.9 cM, who also matched each other 18.9 cM. After Timbering, only one remained, a “Good” match. They had virtually the same start and stop points – if anything, the match that disappeared was a tiny bit bigger.

    Seems like pretty selective thinning.

  4. David Negus

    It would be nice if customers could opt to receive the larger segments filtered by timber–say those over 15 cM.

  5. As you mentioned that one reason for the existence of commonly matched regions is common deep ancestry, is AncestryDNA going to exploit these kind of matches for an improved “ethnicity” product? And, in testing Timber, did you subset your 300,000 test population by dominant ethnicity groups, to see if there are biases when comparing two tests of mostly similar ancestry, versus two tests of disparate ancestry?

  6. If I understand you correctly, Timber removes the ancient segments from the calculation, but not from the raw data. So they would still show up in Ancestry’s chromosome brows…. oh wait.

  7. Timber may need some tweaking, as some of the smaller matches I had, which had good supporting evidence via the paper trail, disappeared in the great cleansing. That said, I’m glad that a majority of the spurious matches were dealt with.

    Some of my most interesting matches are statistical anomalies, such that what appears to be a third or fourth cousin has well-vetted documentation as a sixth or seventh (always assuming no NPEs). Still within the ranges that Blaine Bettinger (e.g.) pointed out in a recent article.

    Of course, I can do the comparison only on those relatives who have uploaded their data to GEDmatch. It’s time for AncestryDNA to bite the bullet and give us the actual match data. I know that management thinks this is a “privacy” issue — but if you sign up for AncestryHealth, you yourself give away all your data to whomever Ancestry wants to give it to — your health insurer, a pharmaceutical company, a potential employer. If I want to share a match with another DNA program participant, I already have that privilege; but Ancestry makes it very, very hard to do so. The imbalance is just plain wrong.

  8. Ann Turner

    I’m hearing anecdotal reports of a quite high percentage of cases (30% plus) where a match in a child is not found in either parent. This has to be a false positive in the child or a false negative in a parent. I’m wondering if allowing the larger matches to pass through (as suggested by David Negus above) would resolve some of those discrepancies.

  9. I have a very thorough tree, researched by my parents the old fashioned way in churches, city halls and graveyards. I was researching some of the non-direct line cousins which I had a good paper trail for when the

  10. I have a very extensive family tree done the old fashioned way and I was in the process of tying some of my DNA matches into some of the further back ancestors. I had a good paper trail for the tree and usually for the ancestor I was tying in when the DNA matches disappeared. If you are arbitrarily going to make these decisions on my behalf at least prove it to me by letting me analyze the chromosomes. This attitude on your part that the customers and thousands of genetic genealogists don’t know anything is abhorrent to me.
    I have lost key ancestors on some of the family finding searches I have been on, ones that tied together several DNA cousins. Worse yet, is that you think that the customer is stupid enough to blindly follow your so called DNA Circles and Predicted Ancestors. They might be someone’s ancestors but I assure you that someone who went to Utah from Maine and whose family is still documented in Maine is a laughable Predicted Ancestor for me. All DNA Circles gets me is more cousins further out.I have little interest in adding 3rd cousins, 3 times removed, for example, to my personal family tree. Please stop making these unsupported proclamations on your customers. As far as I am concerned your very,very wrong profile of one of my third great grandfathers is disastrous. My family worked very hard to get accurate information on my ancestors and now his story will be wrong for as long as people have copied it to their trees, Shame on you for this action that goes against all the rules and principles of established genealogy..

  11. Jason Lee

    “…AncestryDNA test-takers can receive DNA matches that are more reliable for genealogy research.”

    Customers who are seriously interested in DNA match results that are more reliable for genealogy research need to download their raw data file from AncestryDNA and transfer their file to one (or both) of the two companies that accept AncestryDNA files.

    Although I trust that it is doing a fine job of conserving computational resources, Timber does not appear to be the best system for culling out useless matches and keeping the helpful ones.

    I think you’re on the right track, Ann Turner (see comment above).

    Speaking for myself, I’d be more confident in a system that eliminates at least 99% of the 5-6 cM segments and keeps at least 99% of the segments that are 20 cM or larger.

    I now have well over 8,000 matches at AncestryDNA. Having examined my best matches very carefully, I doubt that the majority of the segments represented by those 8,000 matches are relevant to a common ancestor within the last 6 or 10 generations.

    I’d gladly trade most of my 8,000 matches for some matching segment details and all of the 20+ cM segments that have been filtered out.

    Take it from the people who’ve put your product through its paces, Julie Granka, there’s plenty of room for improvement.

  12. Lou Sherburne

    Re: “At AncestryDNA, we use an algorithm developed by the science team called Timber. Its basic idea is that if two people appear to have identical DNA at a particular place in the genome, but they also appear to have identical DNA with thousands of other people at that particular place, then the shared DNA between the two people was probably inherited from a more distant ancestor (or a distant set of ancestors). In other words, that segment is probably not relevant to a common ancestor within the last 6 or 10 generations.”

    Timber has eliminated or downgraded the confidence of many of my segment-triangulation-confirmed matches. Yes, my cousin EG (for example) descends from my 4th great grandparents four different ways and many people share each of our four segments with us BUT a large number of those people have documented ancestry back to those ancestors! Yes, there are many who share the segments but that is likely due to fecundity/endogamy, rather than shared segments that are merely identical by state.

  13. David Hamill

    Population bias in Timber? … even if Timber was valid and useful for the population represented by most of the Ancestry DNA participants .. it would not be for participants with a different ethnic history, or otherwise descended from a different population group (think recent immigrants from a country without a long history of immigration to the US). Matching segments common in the dominant population group and representing distant ancestors, could easily be of genealogical significance in a different population. This ethnic or population bias due to the use of Timber suggests Ancestry DNA should be avoided by those who are not members of the dominant population.

  14. Abbie

    Got a question … If I share 1,717 centimorgans over 62 segments … What would that make us .. Extremely high and match in close family .. So confusing .

  15. Joe Espindola

    I found a cousin that has this: 595 centimorgans shared across 32 DNA segments. In plain english, what does that mean??

  16. LeaMarie Robertson

    I’m trying to find more distant relatives – relatives of ancestors who still live in Wales. Is there any way to check for relatives matches from a specific place like those living in the UK or Wales?

  17. ces

    When matching my DNA to ancestor DNA; did Ancestry actually obtain DNA samples from the “ancestors,” or from the family members in the DnaCircle? How did Ancestry obtain dead person DNA?

  18. Mary Ann Schaefer

    I’m directing this comment to the author of this article. In this paragraph, “What this chart clearly shows is that when trying to identify DNA matches who share DNA from a recent common ancestor, there is much more to it than the sizes of the identical DNA segments. Longer identical segments don’t necessarily prove that two people have a recent common ancestor. DNA matching among hundreds of thousands of people has shown that even long identical segments can indicate shared ancestry, shared population history, or a more distant shared ancestor.” In the part that says “even the long identical segments”, did you mean “even the short identical segments”??? I think that’s what you were trying to say???

  19. Patricia Ann Kellner

    Accuracy is off. My known, provable cousin relationships are inaccurately reported. A double-check on gedmatch confirms AncestryDNA has shaved off a lot of cM’s on these matches. The relationship level on several of my matches is now under-reported. For example…a confirmed 3rd cousin is now listed as a 4th. This same scenario repeats on other matches, too.
    Please supply a chromosome browser. Having to rely on Ancestry’s ever changing algorithm curved results is not good science and doesn’t help me do reliable genetic genealogy.

    In genealogy, every fact should be scrutinized for accuracy. In DNA…a chromosome browser is essential for accurate analysis. You are telling us to accept your results at face value. That is not good practice.

    Many of us are working around you by encouraging our matches onto gedmatch. But, we shouldn’t have to do that. And most of people don’t put their results on gedmatch so we are missing on chances of triangulating true matches because you won’t give us a chromosome browser.

  20. Lydia Bishop

    I am not thrilled with how Timber is dealing with my mid-range matches. I have a “mystery cousin” match who, according to AncestryDNA shares about 22cM on one chromosome. Which one? I dunno. AncestryDNA won’t say. Her father just tested and he’s NOT a match of mine! He however does match one of my known third cousins. My known third cousin’s mother also tested at AncestryDNA – she matches me, but NOT “mystery cousin and her father. But over at the “competition” – GEDMatch has us all nicely matched and lined up on chromosome 12. Your system’s lack of transparency and not telling your members on what chromosomes we share DNA with our matches has got to be changed. We want transparency and a chromosome browser.

Join the Discussion

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated. For help with a specific problem, please contact customer service.