Automated Entity Extraction Making German Historical Records Searchable

Posted by Laryn Brown on May 6, 2016 in Big Data, Image Processing and Analysis

For several years now Ancestry has been publishing collections of records from the U.S. that have been “transcribed” using a method we call Entity Extraction. One example is the U.S. City Directory collection. A precursor to modern telephone books, city directories listed all of the inhabitants of a city, along with their address, occupation, and Read More

Competition as Collaboration – Ancestry.com Handwriting Recognition Competition

Posted by Michael Murdock on March 14, 2014 in Image Processing and Analysis

We are excited to announce that the Ancestry.com handwriting recognition competition proposal was accepted as one of seven, official International Conference on the Frontiers of Handwriting (ICFHR-2014) competitions. As part of our competition on word recognition from segmented historical documents, we are announcing the availability of a new image database1, ANWRESH-1, which contains segmented and labeled Read More

Document Analysis and Recognition – What is Document Analysis?

Posted by Michael Murdock on October 5, 2013 in Image Processing and Analysis

I recently attended the three-day biennial International Conference on Document Analysis and Recognition (ICDAR-2013) in Washington, DC. ICDAR is sponsored by the International Association for Pattern Recognition and is the premier event for those working in the field of Document Analysis (DA). Primarily the attendees are from corporate and university labs; they are professors, graduate students and technologists involved Read More

Image Processing at Ancestry.com – Part 6: Auto-Sharpening

Posted by Michael Murdock on July 17, 2013 in Image Processing and Analysis

This post is the sixth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  The core functionality of the IPP is illustrated in the following diagram. In this post I continue with the material from Read More

Throttling Image Processing

Posted by Tyler Jensen on June 21, 2013 in Distributed Computing, Image Processing and Analysis

Ancestry.com, like any other site with millions of subscribers, experiences predictable load patterns throughout the day. To maximize site performance and customer satisfaction, we make every effort to schedule maintenance during off-peak intervals. Content processing, especially our repository of hundreds of millions of images, on the other hand, is a constant ongoing effort, and in Read More