Lesson Learned: Sharing Code With Git Submodule

Posted by Seng Lin Shee on February 26, 2015 in Development, Operations

You are probably aware of Git Submodules. If you haven’t, you may want to read about it from Atlassian and Git itself. In summary, Git provides a way to embed a reference to a separate project within a main project, while treating both projects as separate entities (versioning, commits etc).  This article applies to any Read More

Big Data for Developers at Ancestry

Posted by Seng Lin Shee on September 25, 2014 in Development, Operations

Big Data has been all the craze. Business, marketing and project managers like it because they can plot out trends to make decisions. To us developers, Big Data is just a bunch of logs.  In this blog post, I would like to point out that Big Data (or logs with context) can be leveraged by Read More

The Importance of Context in Resolving Ambiguous Place Data

Posted by Laryn Brown on July 10, 2014 in Operations

When interpreting historical documents for the intent of researching your ancestors, you are often presented with less than perfect data. Many of the records that are the backbone of family history research are bureaucratic scraps of paper filled out decades ago in some government building. We should hardly be surprised when the data entered is Read More

Lessons Learned Building a Messaging Framework

Posted by Xuyen On on July 1, 2014 in Operations

We have built out an initial logging framework with Kafka 0.7.2, a messaging system developed at LinkedIn. This blog post will go over some of the lessons we’ve learned by building out the framework here at Ancestry.com. Most of our application servers are Windows-based and we want to capture IIS logs from these servers. However, Read More

Controlling Costs in a Cloudy Environment

Posted by Daniel Sands on June 24, 2014 in Development, Operations

From an engineering and development standpoint, one of the most important aspects of cloud infrastructure is the concept of unlimited resources. The idea of being able to get a new server to experiment with, or being able to spin up more servers on the fly to handle a traffic spike is a foundational benefit of Read More

Adventures in Big Data: Commodity Hardware Blues

Posted by Bill Yetman on June 20, 2014 in Operations

One of the real advantages of a system like Hadoop is that it runs on commodity hardware. This will keep your hardware costs low. But when that hardware fails at an unusually high rate it can really throw a wrench into your plans. This was the case recently when we set up a new cluster Read More

Dealing with Your Team’s Bell Curve

Posted by Daniel Sands on June 6, 2014 in Development, Operations

I recently came across this article on the INTUIT QuickBase blog and was intrigued by the premise. It asserts that inside any team or organization, you will have a bell curve of talent and intelligence – which most would agree to. It’s not a bad thing, it just happens. Regardless of how well staffed you Read More

Migrating From TFS to Git-based Repositories (Part I)

Posted by Seng Lin Shee on April 29, 2014 in Development, Operations

Git, a distributed revision control and source code management system has been making waves for years, and many software houses have been slowly adopting this system as not only their source code repository, but also as a way software development projects are managed. There is much debate about using either a centralized or distributed revision Read More