One of the real advantages of a system like Hadoop is that it runs on commodity hardware. This will keep your hardware costs low. But when that hardware fails at an unusually high rate it can really throw a wrench into your plans. This was the case recently when we set up a new cluster Read More
Bill Yetman has served as VP of Engineering at Ancestry.com since January 2014. Bill has held multiple positions with Ancestry.com from August 2002, including Senior Director of Engineering, Director of Sites, Mobile and APIs, Director of Ad Operations and Ad Sales, Senior Software Manager of eCommerce and Senior Software Developer. Prior to joining Ancestry.com, he held several developer and programmer roles with Coresoft Technologies, Inc., Novell/Word Perfect, Fujitsu Systems of America and NCR. Mr. Yetman holds a B.S. in Computer Science and a B.A. in Psychology from San Diego State University.
Interest in direct-to-consumer DNA testing has grown dramatically in the past few years. When you’re measuring more than 700,000 DNA markers for each individual, how do you analyze all that data across a rapidly growing database, while providing actionable results for your customers? At the Hadoop Summit next week,
Ancestry.com to Host HBase Meetup on March 12th at our SF office If you are thinking about starting a Big Data Initiative, you may want to consider its affect across the organization. At Ancestry.com, we have been a very traditional Microsoft .NET and SQL Server shop for a long time. Several Initiatives, two which involve Read More
Jeremy Pollack, an engineer on the DNA Pipeline Team, and I presented together at QCon San Francisco this week. It was a real tag team effort from two different points-of-view – the “Manager” and the “Developer” view of the same project. Having both of us on stage was a first, but it seemed to work really Read More
I recently had the opportunity to present the story of the Ancestry.com DNA pipeline project at the Utah Big Mountain Conference put on by Utah Geek Events. It really is a great story:
I’m taking a break from writing about “Adventures in Big Data” to focus on another passion – how to inspire collaboration and innovation within your organization. At Ancestry.com, we hold a FedEx Day twice a year, and try to make a big deal out of the event. The idea to run our own FedEx Day Read More
It is interesting to reflect on how we thought we would work with Big Data and compare it to our day-to-day processes. We anticipated writing MapReduce jobs in Java that process our data, transform it, and produce aggregate results. Reality is somewhat different.
I decided to write this blog post to help people who are working with Big Data and Hadoop and would benefit from my experience. I always learn more from mistakes. I have lots of scars to prove that point. Even so, this blog is a bit painful to write.
A little over 8 months ago, I was asked to build a data mining cluster at Ancestry using Hadoop. Even though Ancestry has been using Hadoop for nearly 3 years, this was my first exposure to the technology and the company’s initial attempt to collect everything. Honestly, I did not know where or how to start. Read More