Interest in direct-to-consumer DNA testing has grown dramatically in the past few years. When you’re measuring more than 700,000 DNA markers for each individual, how do you analyze all that data across a rapidly growing database, while providing actionable results for your customers? At the Hadoop Summit next week, I will be discussing our uses of Hadoop and other open source projects (HBase, Azkaban, etc.) to handle, at scale, the ethnicity predictions and processing behind the AncestryDNA genetic cousin matching algorithm. The talk will include how the development team grew and matured as we worked with Hadoop. As our DNA pool continues to grow, and we continue to improve our algorithms, we are faced with new science and technical problems that need to be overcome.
Please join me as I walk through the science behind processing several hundred thousand DNA samples and how we leveraged both science and technology to solve a business problem. This is a really unique application that shows how versatile Hadoop can be. We’ll dig in and show how the science and development teams collaborated on this project, through various updates, to deliver a significantly improved user experience and what’s next for our team.
We hope to see you next week!
Thursday, June 5, 2014 (3rd day)
11:00 am PT
Hadoop Driven Business Track
San Jose Convention Center