The Importance of Context in Resolving Ambiguous Place Data

Posted by Laryn Brown on July 10, 2014 in Big Data, Uncategorized

When interpreting historical documents for the intent of researching your ancestors, you are often presented with less than perfect data. Many of the records that are the backbone of family history research are bureaucratic scraps of paper filled out decades ago in some government building. We should hardly be surprised when the data entered is… Read more

Lessons Learned Building a Messaging Framework

Posted by Xuyen On on July 1, 2014 in Big Data

We have built out an initial logging framework with Kafka 0.7.2, a messaging system developed at LinkedIn. This blog post will go over some of the lessons we’ve learned by building out the framework here at Ancestry.com. Most of our application servers are Windows-based and we want to capture IIS logs from these servers. However,… Read more

Adventures in Big Data: Commodity Hardware Blues

Posted by Bill Yetman on June 20, 2014 in Big Data

One of the real advantages of a system like Hadoop is that it runs on commodity hardware. This will keep your hardware costs low. But when that hardware fails at an unusually high rate it can really throw a wrench into your plans. This was the case recently when we set up a new cluster… Read more

Ancestry.com to Present Jermline on DNA Day at the Global Big Data Conference

Posted by Jeremy Pollack on April 9, 2014 in Big Data, Data Science, Development, DNA, Science

Interested in genealogy?  Curious about DNA?  Fascinated by the world of big data?  If so, come check out my talk  at the Global Big Data Conference on DNA day this Friday, April 25 at 4pmPT in the Santa Clara Convention Center!  I’ll cover Jermline, our massively-scalable DNA matching application.  I’ll talk about our business, give a run-through… Read more

Using Mappers to Read and Partition Large amounts of Data from Kafka into Hadoop

Posted by Xuyen On on April 8, 2014 in Big Data

In my previous posts, I outlined how to import data into Hive tables using Hive scripts and dynamic partitioning. However, we’ve found that this only works for small batch sizes and it is not scalable for larger jobs. Instead, we found that it is faster and more efficient to partition the data as they are… Read more

DNA and the Masses: The Science and Technology Behind Discovering Who You Really Are

Posted by Melissa Garrett on March 12, 2014 in Analytics, Big Data, DNA, Science

Originally published on Wired Innovation Insights, 3-12-14. There is a growing interest among mainstream consumers to learn more about who they are and where they came from. The good news is that DNA tests are no longer reserved for large medical research teams or plot lines in CSI. Now, the popularity of direct-to-consumer (DTC) DNA tests… Read more

Ancestry.com to Lead Core Conversation at SXSW

Posted by Melissa Garrett on March 6, 2014 in Big Data, Technology Conferences

Headed to SXSW Interactive? Join EVP of Product, Eric Shoup and Senior Director of Product at Tableau, Francois Ajenstat, for an engaging Core Conversation about how using big data can tell personalized stories. Big Data is a game changer for storytelling. Too often, the data we pull is cold, factual and dehumanized. Technologies can now… Read more

Inferring Familiar Relationships From Historical Data Features (Part 2)

Posted by Laryn Brown on February 28, 2014 in Big Data

In my previous post, I outlined some of the problems and strategies we use at Ancestry.com to determine if two people who appear in the same household are related. As promised, I want to focus this time on how to resolve ambiguous results. In my early days of doing family history research, I made an… Read more

Video Q&A with Lead Engineer at Ancestry.com

Posted by Melissa Garrett on February 21, 2014 in Big Data, CSS/HTML/JavaScript, DNA, Technology Conferences

Jeremy Pollack, a lead engineer at Ancestry.com, answers questions on the technical backend of AncestryDNA in a video interview with InfoQ. The interview took place after his presentation with Bill Yetman on scaling AncestryDNA using Hadoop and HBase at QConSF in 2013. Check it out!

Adventures in Big Data: Join the Community, Share, and Give Back

Posted by Bill Yetman on February 19, 2014 in Big Data, Development

Ancestry.com to Host HBase Meetup on March 12th at our SF office  If you are thinking about starting a Big Data Initiative, you may want to consider its affect across the organization. At Ancestry.com, we have been a very traditional Microsoft .NET and SQL Server shop for a long time. Several Initiatives, two which involve… Read more