Every week, AncestryDNA analyzes thousands of peoples’ DNA, decoding their family origins and finding their long-lost relatives. To that end, we used GERMLINE, an algorithm for finding hidden family relationships within a pool of DNA. However, the reference implementation of GERMLINE didn’t scale, and we were running up against its limitations. This Thursday, I’ll be giving a talk at HBaseCon about how we solved this problem. Find out how AncestryDNA leveraged Hadoop and HBase to implement a scalable cleanroom implementation of the GERMLINE algorithm, resulting in a 1700% performance improvement.
Thursday, June 13, 2013, 5:20pm – 5:40pm
Presented by Jeremy Pollack (Ancestry.com)
At the San Francisco Marriot Marquis, in the Yerba Buena 13-15 room
*Update: View the video of my presentation here.
About Jeremy Pollack
Jeremy is a senior engineer at Ancestry.com, where his team supports a team of scientists and makes their discoveries scale. In the past, he’s written code that withstood the traffic from a Superbowl ad, created the content management system for one of the web’s most popular parenting sites, and looked after the technology needs of a well-known online magazine. When he’s not coding, he enjoys reading, playing the darbuka, and throwing awesome dinner parties.