Xuyen On

Xuyen On is a Senior Software Engineer at Ancestry.com who works in the Data Services Team where he is building out a new infrastructure to collect Big Data and make it available to company.

Lessons Learned Building a Messaging Framework

Posted by Xuyen On on July 1, 2014 in Big Data

We have built out an initial logging framework with Kafka 0.7.2, a messaging system developed at LinkedIn. This blog post will go over some of the lessons we’ve learned by building out the framework here at Ancestry.com. Most of our application servers are Windows-based and we want to capture IIS logs from these servers. However, Read More

Handling Dynamic JSON Schemas

Posted by Xuyen On on February 5, 2014 in Big Data

In my last post, I introduced our first steps in creating a scalable, high volume messaging system and would like to provide an update on our progress. We have built out a 0.7.2 Kafka cluster to start ingesting data from our servers. The cluster consists of the following: 5 x  Kafka nodes •    Dual 6 Read More

First steps to building a scalable high volume messaging system

Posted by Xuyen On on November 16, 2013 in Big Data

At Ancestry.com we are becoming more data driven. That means we want to capture more data about our systems, including how our users are interacting with them. Part of that strategy is to capture the log files from our application servers and put them into our Hadoop cluster. We have tried using MSMQ and RabbitMQ Read More

Using Hive and HBase to Query and Maintain Mutable Data

Posted by Xuyen On on May 23, 2013 in Big Data

Hive is good at querying immutable data like log files. These are files that do not change after they are written. But what if you want to query data that can change? For example, users of our site frequently make modifications to their family trees. Some of this data sits in very large and frequently Read More