At Ancestry.com we face all sorts of exciting technology challenges to grow our business and support the company’s mission of helping everyone discover, preserve, and share their family history. This week, I’m excited to share some of the learnings we’ve had along the way in leveraging our data to help users make more meaningful discoveries about themselves and their past.
I have the pleasure of speaking at Data Driven NYC and Strata Conference + Hadoop World in New York City this coming week to discuss how Ancestry.com is leveraging Big Data capabilities by using Hadoop – specifically related to analytics and creation of products such as AncestryDNA.
We did not get started with Hadoop by looking for ways to use it. We always start with a problem that needs to be solved and then ask ourselves– what is the right tool to solve the problem? One of the most interesting problems (of which I will touch on in my presentations) which required Hadoop to solve, is our DNA processing pipeline.
Discovering information about yourself through your own DNA is more accessible – and more affordable – than ever before. In 2012, Ancestry.com made the secrets of users’ DNA come to life in a meaningful way through a new autosomal DNA test, AncestryDNA. Derrick Harris of GigaOm described the experience of our users in this way, “Spit in a tube, pay $99, learn your past.” By measuring a user’s DNA at 700,000 marker locations (SNPs) we are able to identify a users’ ethnicity and provide distant cousin matches.
Scaling to handle DNA data matching is an N-squared problem. Combined with the existing 10 petabytes of family history data in the company’s archive, the new flood of DNA data presents significant Big Data challenges and opportunities, many of which are of interest to other companies whose business models are predicated on similar massive data troves. For example, how do we analyze and use the massive amounts of genome data to improve the overall customer experience and help our customers to find new relatives or uncover new family history discoveries?
Monday, October 28, 2013, 5:45PM ET
Strata Conference + Hadoop World Session Info:
Wednesday, October 30, 2013, 1:45PM ET
Sutton Center – Sutton South
I hope you’ll join me at these two events to learn about our lessons learned along the way.