Inferring Familiar Relationships From Historical Data Features (Part 1)

Posted by Laryn Brown on December 13, 2013 in Big Data

In the recent uncovering of NSA activity revealed by Edward Snowden, we see that the relationships between people can be some of the most valuable data that can be inferred from big data. The knowledge of who a person knows, who they have contacted, and who they are related to is apparently critical information for… Read more

First steps to building a scalable high volume messaging system

Posted by Xuyen On on November 16, 2013 in Big Data

At we are becoming more data driven. That means we want to capture more data about our systems, including how our users are interacting with them. Part of that strategy is to capture the log files from our application servers and put them into our Hadoop cluster. We have tried using MSMQ and RabbitMQ… Read more

Adventures in Big Data: Presented Scaling AncestryDNA using Hadoop and HBase at QCon San Francisco

Posted by Bill Yetman on November 14, 2013 in Big Data, DNA, Technology Conferences

Jeremy Pollack, an engineer on the DNA Pipeline Team, and I presented together at QCon San Francisco this week. It was a real tag team effort from two different points-of-view – the “Manager” and the “Developer” view of the same project. Having both of us on stage was a first, but it seemed to work really… Read more to Present at QConSF

Posted by Melissa Garrett on November 9, 2013 in Agile, Big Data, Technology Conferences

Like many organizations, is constantly accumulating more high volume, high velocity data of all kinds. We apply innovation at scale to handle 10 petabytes of highly dynamic family history data, and a flood of new data derived from our autosomal DNA test, AncestryDNA. How do we do it? Join our dynamic duo Bill Yetman… Read more to Present at Upcoming Tech Conferences in New York City

Posted by Scott Sorensen on October 28, 2013 in Big Data, DNA, Technology Conferences

At we face all sorts of exciting technology challenges to grow our business and support the company’s mission of helping everyone discover, preserve, and share their family history. This week, I’m excited to share some of the learnings we’ve had along the way in leveraging our data to help users make more meaningful discoveries… Read more

Adventures in Big Data: How AncestryDNA Uses Hadoop and HBase

Posted by Bill Yetman on September 26, 2013 in Agile, Big Data

I recently had the opportunity to present the story of the DNA  pipeline project at the Utah Big Mountain Conference put on by Utah Geek Events. It really is a great story:

Hourglass-shaped Data Processing

Posted by Laryn Brown on September 3, 2013 in Big Data

One of the insights that I brought to my current job from the localization industry many years ago is the idea that you should create once and export or publish many times. In that industry we had the concept of an asset – say an instruction manual on how to use your microwave – that… Read more

How Practices Agile to Solve Challenges with Consumer DNA Testing

Posted by Aaron Ling on August 29, 2013 in Agile, Big Data, Science, Technical Management

A typical web application starts with a blank page. Then in further sprints, you can add features to it. (I sound like one of your Agile coaches, don’t I?) But in reality, the business needs you to deliver more value than a blank page. So, how can you quantify the minimum value you are delivering… Read more

Adventures in Big Data: Not writing much Java MapReduce code? You’re probably on the right track

Posted by Bill Yetman on August 10, 2013 in Big Data

It is interesting to reflect on how we thought we would work with Big Data and compare it to our day-to-day processes. We anticipated writing MapReduce jobs in Java that process our data, transform it, and produce aggregate results. Reality is somewhat different.

A Quick and Efficient Way to Update Hive Tables Using Partitions

Posted by Xuyen On on August 7, 2013 in Big Data

A Quick and Efficient Way to Update Hive Tables Using Partitions In my previous post, I outlined a strategy to update mutable data in Hadoop by using Hive on top of HBase. In this post, I will outline another strategy to update data in Hive. Instead of using a backend system to update data like… Read more