The Science Behind a More Precise DNA Matching Algorithm

Posted by Ancestry Team on May 3, 2016 in Analytics, DNA Tech, Science

Today we announced that the matching portions of the AncestryDNA test results have been updated. The purpose of this post is to give you a little more detail around the science behind these improvements.

Our previous DNA matching algorithms were based on the AncestryDNA database when it was populated by about half a million people. This latest update is based on three times that many people: 1.5 million. Our larger database has allowed us to make improvements to the matching algorithms. What are those improvements, and how are they going to help the overall matching experience?

Let’s take a look at the current version of the matching pipeline. Below is an image taken from the white paper that highlights this process.

I am not going to take you through each step; instead, I want to focus on where the advances have been made. There are four areas (highlighted by the four stars):

Phasing—adding more duo- and trio-phased samples
Matching algorithm—more precise estimates of where each shared segment begins and ends
Consolidate matching segments—improvements to relationship estimating with a larger database
New Match results—updated for all existing DNA customers

Individual results may vary, but we believe that the improvements to these four areas will improve the overall matching process for our customers.

Phasing

Phasing refers to the process of computationally determining the assignment of allele copies to chromosomes. In other words, phasing estimates the string of letters of DNA inherited as a unit from each parent. We have now added additional known DNA-tested parent and child (duo sets) and known DNA-tested parents (both) and child (trio sets) to the process to make our phasing even more accurate. You can learn more about the studies we used to prove these methods in section 2.3 of the white paper.

Matching Algorithm

Until now, we were limited to looking at narrow windows across the genome, as we broke it up into small segments. With this update, we don’t need to use the window-based approach anymore. We now use SNPs (single nucleotide polymorphisms) to determine the stop and end points, which lets us measure start and end points of each match segment with more precision.

Consolidate Matching Results

The improved phasing of the genome along with the increased precision in identifying shared segments means our science team can estimate the relationship between DNA matches with more precision. Our findings and validation process have led to new evidence about how much shared DNA people are likely to have across all relationships. See the chart below to see how we determine each level of relationship based on how many centimorgans matches share.

New Match ResultsThe number of centimorgans you share with a match can also help you understand your relationship to them. For example, you’ll usually share about 120 centimorgans with a 3rd cousin, but it’s possible to share as few as 90 or as many as 200. Be aware that the precise amount of shared DNA can vary beyond the ranges shown in the table above.

New match results have been provided for everyone. Whether you tested with AncestryDNA last week or 2 years ago, your results have been recalculated based on these advancements. Because of this, you may see some DNA matches that were previously predicted to be more closely related to you at a higher confidence, drop down on your list, or no longer appear. Also, you’ll have new DNA matches that you haven’t seen before. If you have taken notes or “starred” a DNA match that no longer appears on your new list, you can download information about that previous match from the DNA test settings page. This will be available for a limited time so you should download any such information as soon as you can.

If you are waiting for your results or are in the process of getting a test, your matching results will go through this new procedure as well.

Want to learn more? Check out the matching white paper or the FAQs.

Go View Your DNA Matches

Comments

jimbartlett1

May 4, 2016 at 1:36 am

All I want for Christmas is the shared segment Chr number, start location and cM. This will let me determine which of several shared Ancestors is indicated by the DNA; as well as which ancestral lines are responsible for each area of DNA. Thank you for considering this.

Reply
BOBZEHYPSILANTI

May 4, 2016 at 11:48 am

Today I seemed to have lost a DNA match that I actually know is my cousin because I know where he exists in my life and we have communicated extensively. I don’t mind changes , but when they are this far off it makes me wonder who is writing algorithms.

Reply
john tetlow

May 4, 2016 at 10:19 pm

Are the results sorted in any order within groups? Like more shared DNA sort first?

Reply
CulsenGenealogy

May 5, 2016 at 2:38 am

The matches will continue to evolve – just as Ancestry has always stated they would.

As far as ‘matches’ that disappeared, try looking deeper within your pages (I now have 82 pages of matches) for those you know / confirmed. Plus, for a limited time, you can download the previous ‘matches’ – so do this as well.

Reply
Patti Easton

May 7, 2016 at 12:23 pm

Thank you for all you are doing to improve DNA analysis. Please add chromosome matching and segment length data as well.

Reply
SP

May 7, 2016 at 10:02 pm

Please provide a chromosome browser. While having shared cMs is great- it is much easier to identify relationships with a chromosome browser, especially when one doesn’t have parental DNA to work with.
This request has been made by countless users for years now, and ignored. Please stop ignoring the request and provide the chromosome browser.

Reply
Lauren McGuire

May 9, 2016 at 8:27 am

The cM designated as a parent/child or identical twin is 3,475. Does this mean if we want to convert a shared cM to a percentage, would the denominator be 6,950?

Reply
Milton davison

May 10, 2016 at 3:55 am

Where does ancestry get the
DNA from.

Reply
- Milton davison
  
  May 10, 2016 at 3:57 am
  
  Where does ancestry get there
  DNA information from
  
  Reply
Anne Reeves

May 11, 2016 at 3:22 pm

A documented and known third cousin, previously relegated to 5-8th cousin status has now been eliminated altogether, whilst a documented and known 4th cousin (both cousins on my paternal side, though each linked to a different grand-paternal great-grandparent) has been relegated to even further reaches of the distant past. I realise that the “problem” lies in recombination. I do think, however, that Ancestry needs to be more explicit about how, in fact, one might well not share any DNA at all with a third cousin or a fourth. That does not stop them being, legitimately, third or fourth cousins. Given how recombination works we can share – in reality – a recent common ancestor but not any of their genetic material (or at least none that has been included in the comparison tests).

Reply
Carol Reese

June 28, 2016 at 4:03 pm

U must pay for Dna kit before getting it??I thought payment would b sent with sample. How much saliva is nessary??

Reply
Albertus Fuller

July 30, 2016 at 4:11 pm

Thank you so much for finally sending your kit to Europe! I had tested at two other companies, but only Ancestry gave me the very very long-awaited breakthroughs! I am adopted. Since 1998 I know who my mother is: however, she died in 1991. I have met most of my maternal family, including six maternal half-brothers. At Ancestry i have 60 maternal matches (thanks to the family tree matching, i can see this at a glance). I use my maternal matches to determine which of my other matches are probably paternal. Recently i had a couple of very meaningful breakthroughs in autosomal DNA matching. At an Adoption forum i read about a way to find one’s father’s (or mother’s) family by inserting into a file the family trees of all one’s paternal (or maternal) DNA matches: where those trees overlap, those are also my own ancestors. I now know that i am definitely descended from CONRAD JUNG born 1786 in Siefersheim, Germany, died in 1862 in Wayne Co. NY, and his wife Catharina Steinmetz, and from JOHN B. WHITTAKER born in 1799in Manchester, Lancashire, England, died bef 1860 in NY, and his wife Hannah Berry. I have found a marriage between their descendants Frank Casimir Young and Nellie Anna Whittaker on 27 Mar 1901 in Concord, Jackson, Michigan. I believe that his couple may very likely be my grandparents. However, there is a PROBLEM of AMOUNT of DNA shared with me by my DNA MATCHES. There is no problem with Frank Young being my grandfather, as the three JUNG/YOUNG descendants who match me: Kparcero46, PFM, Marthajaneut – share 20, 48, 68 cM with me, making them 4th, 4th, 3rd Cousins to me, respectively. Frank Young’s being my grandfather would make Marthajaneut my 3rd cousin, which corresponds to the shared amount of DNA: 68 cM, as she and I would be descended from the same son of CONRAD JUNG, CASIMIR JUNG, whereas the other two are descended from another son. The problem is with FRANK YOUNG’s wife NELLIE ANNA WHITTAKER. Nellie is a daughter of James Barry Whittaker, son of JOHN B. WHITTAKER. I only share 33 cM with Alpine25, his descendant. Nellie Whittaker’s being my grandmother would make ”Alpine25” my 2nd Cousin Once Removed. Isn’t that too little DNA in common for us to be 2nd Cousins 1xR? On the other hand, i share 117, 103 cM with WW and Rhagni, descendants of John David Whittaker, another son of John B. Whittaker. (I also share 40 cM with a descendant of William Whitaker, who seems to be a brother of John B. Whittaker, the common ancestor of me, Alpine25, WW and Rhagni.) The much larger amount of cM that i share with the descendants of John David Whittaker would seem to mean that he is my direct forefather, rather than his brother James Barry Whittaker. But i can find no convincing marriage amongst John David’s descendants as candidate for my ”paternal grandparents”. My question is: can a DNA match with 33 cM be in fact a 2nd Cousin 1x Removed ? Can DNA matches with 117 and 103 cM truly be only 3rd Cousins Once Removed? which would in fact be our relationships, if Nellie Anna Whittaker and Frank Casimir Young truly are my paternal grandparents, as i strongly suspect. Please help me to understand. Thank you in advance, Albertus

Reply
Michele Mandrioli

August 23, 2016 at 4:34 pm

What are the average and standard deviation in the number of centimorgans of shared DNA between full siblings?

Reply

Join the Discussion

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated. For help with a specific problem, please contact customer service.