Posted by Jeremy Pollack on June 12, 2013 in Big Data, Technology Conferences

Every week, AncestryDNA analyzes thousands of peoples’ DNA, decoding their family origins and finding their long-lost relatives. To that end, we used GERMLINE, an algorithm for finding hidden family relationships within a pool of DNA. However, the reference implementation of GERMLINE didn’t scale, and we were running up against its limitations. This Thursday, I’ll be giving a talk at HBaseCon about how we solved this problem. Find out how AncestryDNA leveraged Hadoop and HBase to implement a scalable cleanroom implementation of the GERMLINE algorithm, resulting in a 1700% performance improvement.


Info :
Thursday, June 13, 2013, 5:20pm – 5:40pm
Presented by Jeremy Pollack (
At the San Francisco Marriot Marquis, in the Yerba Buena 13-15 room

*Update: View the video of my presentation here.


Jeremy Pollack

Jeremy is a senior engineer at, where his team supports a team of scientists and makes their discoveries scale. In the past, he’s written code that withstood the traffic from a Superbowl ad, created the content management system for one of the web’s most popular parenting sites, and looked after the technology needs of a well-known online magazine. When he’s not coding, he enjoys reading, playing the darbuka, and throwing awesome dinner parties.

Join the Discussion

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated. For help with a specific problem, please contact customer service.