Posted by on June 21, 2013 in Distributed Computing, Image Processing and Analysis

Ancestry.com, like any other site with millions of subscribers, experiences predictable load patterns throughout the day. To maximize site performance and customer satisfaction, we make every effort to schedule maintenance during off-peak intervals.

Content processing, especially our repository of hundreds of millions of images, on the other hand, is a constant ongoing effort, and in some cases must be done on live content being served up to our customers. One example of this occurs when we roll an improved set of images for a given collection, such as the 1921 Census of Canada, to the live site. Many of these images may have different dimensions than the originally published images. To be sure we get it right, we double check every image in the collection.

Until now, this work was done with a desktop tool that was effective but could take days to complete its work on very large collections. In order to speed this up, the Enterprise Media Team’s distributed computing initiative created a new service that uses a light weight, open source distributed computing framework called DuoVia.MpiVisor, a project led by this author outside of his regular Ancestry.com responsibilities, to distribute the work on five servers with a total of 64 logical processors.

Distributing the work on 64 logical processors was enormously successful, verifying up to 50,000 images’ dimensions every minute. The challenge was that if we were to allow content management access to this very powerful tool at any time during the day, there was a distinct possibility that it would affect the performance of our live site, something we wanted very much to avoid.

To throttle the new image dimension populating (IDP) service, we created three time zones to define high, medium and low traffic periods during the day. During high traffic periods, we only allowed one third of the processing agents to be given work. And during medium traffic periods, only one half of the available processing agents are used. Of course, during off-peak periods, all available agents are utilized.

In the weeks since the IDP service launched, it has processed over 130 million images in just over 6,700 run-time minutes. That is a throttled average of about 19,000 images processed per minute of processing time, far below its current max potential of 50,000 per minute.

By throttling the work, the IDP service remains responsive during peak traffic times without impacting the customer experience, allowing content teams to continue working to deliver the best images as soon as humanly possible to our customers.

About Tyler Jensen

Tyler Jensen is a senior software engineer in R&D at Ancestry.com. He has worked in the software industry since 1992. He loves to solve difficult technical challenges. When he's not working or writing or reading, he enjoys spending time with his wife and four children.

1 Comment

Sharon Thurgood 

embedded email
Like the idea that you love technical challenges! I have been with tech support, I think that is where I was, with both ancestry.com and amazon.com. I cannot access ancestry.com as I have ever since it started with exception of 1 year. I am running OS X 10.4.11 on an iBook G4. When I access the ancestry.com site it redirects to a bunch of letters, numbers cloudfront.net and I get a gray screen. Ancestry.com says that my operating system is too old; difficult for me to accept for I know there must still be some of your customers running this. It will cost me over $100 to upgrade to OS 10.5 and then how do I know it will then access ancestry.com. I even treckked up the hill to symantec in Culver City and found they only deal with bussinesses there. There tech suppport number was of no help. I have even run Norton antivirus that goes with my OS system and still can’t get into ancestry.com. I have left a message with AWS at amazon and heard nothing back. I am limited to what OS that I can go to on this computer and do not have the money on a retirement income to buy a newer, used Mac. Please help me. Thanks and God Bless, Sharon Thurgood sthurgood@earthlink.net

PS my first modem was 300 baud, $600 with an Apple II in about 1980

September 27, 2013 at 10:11 pm

We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated.

Commenting is open until Friday, 5 July 2013