Posted by on October 28, 2013 in Big Data, DNA, Technology Conferences

At Ancestry.com we face all sorts of exciting technology challenges to grow our business and support the company’s mission of helping everyone discover, preserve, and share their family history. This week, I’m excited to share some of the learnings we’ve had along the way in leveraging our data to help users make more meaningful discoveries about themselves and their past.

I have the pleasure of speaking at Data Driven NYC and Strata Conference + Hadoop World in New York City this coming week to discuss how Ancestry.com is leveraging Big Data capabilities by using Hadoop – specifically related to analytics and creation of products such as AncestryDNA.

We did not get started with Hadoop by looking for ways to use it. We always start with a problem that needs to be solved and then ask ourselves– what is the right tool to solve the problem? One of the most interesting problems (of which I will touch on in my presentations) which required Hadoop to solve, is our DNA processing pipeline.

Discovering information about yourself through your own DNA is more accessible – and more affordable – than ever before. In 2012, Ancestry.com made the secrets of users’ DNA come to life in a meaningful way through a new autosomal DNA test, AncestryDNA. Derrick Harris of GigaOm described the experience of our users in this way, “Spit in a tube, pay $99, learn your past.”  By measuring a user’s DNA at 700,000 marker locations (SNPs) we are able to identify a users’ ethnicity and provide distant cousin matches.

Scaling to handle DNA data matching is an N-squared problem. Combined with the existing 10 petabytes of family history data in the company’s archive, the new flood of DNA data presents significant Big Data challenges and opportunities, many of which are of interest to other companies whose business models are predicated on similar massive data troves. For example, how do we analyze and use the massive amounts of genome data to improve the overall customer experience and help our customers to find new relatives or uncover new family history discoveries?

Data Driven NYC Session Info:

Monday, October 28, 2013, 5:45PM ET

Strata Conference + Hadoop World Session Info:

Wednesday, October 30, 2013, 1:45PM ET

Sutton Center – Sutton South

I hope you’ll join me at these two events to learn about our lessons learned along the way.

About Scott Sorensen

Scott Sorensen has served as Chief Technology Officer of Ancestry.com since April 2013. Scott has been at the company since June 2002 and has held multiple positions including Senior Vice President of Engineering, Vice President of Search and Vice President of Commerce. Prior to joining Ancestry.com, Scott was co-founder and Vice President of Engineering and then President at Coresoft Technologies. Scott was an engineering manager at WordPerfect / Novell and a software engineer at IBM. He holds a B.S in Computer Science from Brigham Young University.


We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated.

Commenting is open until Monday, 11 November 2013