Jeremy Pollack, an engineer on the DNA Pipeline Team, and I presented together at QCon San Francisco this week. It was a real tag team effort from two different points-of-view – the “Manager” and the “Developer” view of the same project. Having both of us on stage was a first, but it seemed to work really… Read more
I recently had the opportunity to present the story of the Ancestry.com DNA pipeline project at the Utah Big Mountain Conference put on by Utah Geek Events. It really is a great story:
I’m taking a break from writing about “Adventures in Big Data” to focus on another passion – how to inspire collaboration and innovation within your organization. At Ancestry.com, we hold a FedEx Day twice a year, and try to make a big deal out of the event. The idea to run our own FedEx Day… Read more
It is interesting to reflect on how we thought we would work with Big Data and compare it to our day-to-day processes. We anticipated writing MapReduce jobs in Java that process our data, transform it, and produce aggregate results. Reality is somewhat different.
I decided to write this blog post to help people who are working with Big Data and Hadoop and would benefit from my experience. I always learn more from mistakes. I have lots of scars to prove that point. Even so, this blog is a bit painful to write.
A little over 8 months ago, I was asked to build a data mining cluster at Ancestry using Hadoop. Even though Ancestry has been using Hadoop for nearly 3 years, this was my first exposure to the technology and the company’s initial attempt to collect everything. Honestly, I did not know where or how to start.… Read more
This blog is focused on the technology used behind the scenes at Ancestry.com. It’s a place to learn about the experiences we have, the challenges we face and the solutions we use on our engineering and tech teams to create the Ancestry.com experience.Visit Ancestry.com