Posted by Ancestry Team on October 28, 2013 in Big Data, DNA Tech, Technology Conferences

At we face all sorts of exciting technology challenges to grow our business and support the company’s mission of helping everyone discover, preserve, and share their family history. This week, I’m excited to share some of the learnings we’ve had along the way in leveraging our data to help users make more meaningful discoveries about themselves and their past.

I have the pleasure of speaking at Data Driven NYC and Strata Conference + Hadoop World in New York City this coming week to discuss how is leveraging Big Data capabilities by using Hadoop – specifically related to analytics and creation of products such as AncestryDNA.

We did not get started with Hadoop by looking for ways to use it. We always start with a problem that needs to be solved and then ask ourselves– what is the right tool to solve the problem? One of the most interesting problems (of which I will touch on in my presentations) which required Hadoop to solve, is our DNA processing pipeline.

Discovering information about yourself through your own DNA is more accessible – and more affordable – than ever before. In 2012, made the secrets of users’ DNA come to life in a meaningful way through a new autosomal DNA test, AncestryDNA. Derrick Harris of GigaOm described the experience of our users in this way, “Spit in a tube, pay $99, learn your past.”  By measuring a user’s DNA at 700,000 marker locations (SNPs) we are able to identify a users’ ethnicity and provide distant cousin matches.

Scaling to handle DNA data matching is an N-squared problem. Combined with the existing 10 petabytes of family history data in the company’s archive, the new flood of DNA data presents significant Big Data challenges and opportunities, many of which are of interest to other companies whose business models are predicated on similar massive data troves. For example, how do we analyze and use the massive amounts of genome data to improve the overall customer experience and help our customers to find new relatives or uncover new family history discoveries?

Data Driven NYC Session Info:

Monday, October 28, 2013, 5:45PM ET

Strata Conference + Hadoop World Session Info:

Wednesday, October 30, 2013, 1:45PM ET

Sutton Center – Sutton South

I hope you’ll join me at these two events to learn about our lessons learned along the way.

Join the Discussion

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated. For help with a specific problem, please contact customer service.