Handling Dynamic JSON Schemas

Posted by Xuyen On on February 5, 2014 in Big Data

In my last post, I introduced our first steps in creating a scalable, high volume messaging system and would like to provide an update on our progress. We have built out a 0.7.2 Kafka cluster to start ingesting data from our servers. The cluster consists of the following: 5 x  Kafka nodes •    Dual 6… Read more

IT Transformation as a “Business” Discipline

Posted by Deal Daly on January 22, 2014 in Big Data, IT

This series of essays will explore IT transformation (“futurization”) as a functional discipline of the business. IT transformation has clear business purposes.   The transformational activity works to: increase speed to market for the business’s products and services, provide new and fulfilling career paths to IT engineers, increase reliability, availability and performance of systems and… Read more

Visualizing Family Trees

Posted by Leonid Zhukov on January 17, 2014 in Big Data

A company’s data set is a unique asset and it is very advantageous for companies to know what one of its most valuable assets looks like to make product and business decisions.  That is where data scientists come in: we like to study data.  At Ancestry.com, we have a large and unique set of data, which… Read more

On Track to Data-Driven

Posted by Aaron Ling on December 25, 2013 in Big Data, Distributed Computing, Technical Management

Ancestry.com becomes more and more aware of the value of the data our website generates every single day. We have a lot of customers coming to the website to discover, preserve and share their family history. They come from different parts of the world and are looking for information that helps them tell the story… Read more

Inferring Familiar Relationships From Historical Data Features (Part 1)

Posted by Laryn Brown on December 13, 2013 in Big Data

In the recent uncovering of NSA activity revealed by Edward Snowden, we see that the relationships between people can be some of the most valuable data that can be inferred from big data. The knowledge of who a person knows, who they have contacted, and who they are related to is apparently critical information for… Read more

First steps to building a scalable high volume messaging system

Posted by Xuyen On on November 16, 2013 in Big Data

At Ancestry.com we are becoming more data driven. That means we want to capture more data about our systems, including how our users are interacting with them. Part of that strategy is to capture the log files from our application servers and put them into our Hadoop cluster. We have tried using MSMQ and RabbitMQ… Read more

Adventures in Big Data: Presented Scaling AncestryDNA using Hadoop and HBase at QCon San Francisco

Posted by Bill Yetman on November 14, 2013 in Big Data, DNA, Technology Conferences

Jeremy Pollack, an engineer on the DNA Pipeline Team, and I presented together at QCon San Francisco this week. It was a real tag team effort from two different points-of-view – the “Manager” and the “Developer” view of the same project. Having both of us on stage was a first, but it seemed to work really… Read more

Ancestry.com to Present at QConSF

Posted by Melissa Garrett on November 9, 2013 in Agile, Big Data, Technology Conferences

Like many organizations, Ancestry.com is constantly accumulating more high volume, high velocity data of all kinds. We apply innovation at scale to handle 10 petabytes of highly dynamic family history data, and a flood of new data derived from our autosomal DNA test, AncestryDNA. How do we do it? Join our dynamic duo Bill Yetman… Read more

Ancestry.com to Present at Upcoming Tech Conferences in New York City

Posted by Scott Sorensen on October 28, 2013 in Big Data, DNA, Technology Conferences

At Ancestry.com we face all sorts of exciting technology challenges to grow our business and support the company’s mission of helping everyone discover, preserve, and share their family history. This week, I’m excited to share some of the learnings we’ve had along the way in leveraging our data to help users make more meaningful discoveries… Read more

Adventures in Big Data: How AncestryDNA Uses Hadoop and HBase

Posted by Bill Yetman on September 26, 2013 in Agile, Big Data

I recently had the opportunity to present the story of the Ancestry.com DNA  pipeline project at the Utah Big Mountain Conference put on by Utah Geek Events. It really is a great story: