Adventures in Big Data: Join the Community, Share, and Give Back

Posted by Ancestry Team on February 19, 2014 in Big Data, Development

Ancestry.com to Host HBase Meetup on March 12th at our SF office

If you are thinking about starting a Big Data Initiative, you may want to consider its affect across the organization. At Ancestry.com, we have been a very traditional Microsoft .NET and SQL Server shop for a long time. Several Initiatives, two which involve Big Data, are starting to change our culture. The Open Source Apache Projects in the Hadoop ecosystem involve a level of collaboration, community involvement, and sharing that you need to embrace. For us, we’ve become actively involved with the Bay Area HBase and Apache Kafka communities.

Our AncestryDNA developers started attending HBase meetups. They met developers from other companies willing to share their experiences with HBase. One individual sat down with two team members for lunch and patiently explained how HBase works and could be used in our DNA pipeline. We bought that individual and two team members lunch – an easily justified $45 expense. That personal interaction gave us the information and the confidence to experiment with HBase. The next step was to attend HBaseCon. Once again, our team was able to learn from other developers and build relationships. Once we successfully delivered the DNA project using HBase, it was our turn to share what we learned from that community. At the next HBaseCon, Jeremy Pollack presented how we used HBase to scale our DNA matching. Why share what we learned? Ancestry.com is part of this community, they supported us and it was time to give back. To further show our appreciation and support, Ancestry.com will host an HBase Meetup at our San Francisco office on Wednesday, March 12. There is already a waiting list for the event, we know that not everyone who signs up will show up, so plan to attend.

Our second large initiative involves collecting, processing, and analyzing our click stream data and logs. After looking at several possible options, the team selected the Apache Kafka Project as the mechanism to collect this data. We learned about Kafka by attending the Yahoo! Big Data meetup. We met Richard Park from LinkedIn and learned why they developed Kafka and how they use it. Richard then came to our SF office and gave a brown bag presentation. He shared LinkedIn’s journey with Hadoop, Kafka, and Big Data. It was a huge influence on our Big Data Team. We are now looking at implementing Samza for near real-time stream processing on top of Kafka.

As a development organization, we’re changing, opening up, and embracing the Open Source Community and the Silicon Valley Culture. A culture that puts a premium on open and honest collaboration between developers working at different companies. Get ready, it will change your development team and start changing your company’s culture as well.

Join the Discussion

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated. For help with a specific problem, please contact customer service.