For several years now Ancestry has been publishing collections of records from the U.S. that have been “transcribed” using a method we call Entity Extraction. One example is the U.S. City Directory collection. A precursor to modern telephone books, city directories listed all of the inhabitants of a city, along with their address, occupation, and Read More
We are excited to announce that the Ancestry.com handwriting recognition competition proposal was accepted as one of seven, official International Conference on the Frontiers of Handwriting (ICFHR-2014) competitions. As part of our competition on word recognition from segmented historical documents, we are announcing the availability of a new image database1, ANWRESH-1, which contains segmented and labeled Read More
I recently attended the three-day biennial International Conference on Document Analysis and Recognition (ICDAR-2013) in Washington, DC. ICDAR is sponsored by the International Association for Pattern Recognition and is the premier event for those working in the field of Document Analysis (DA). Primarily the attendees are from corporate and university labs; they are professors, graduate students and technologists involved Read More
This post is the sixth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site. The core functionality of the IPP is illustrated in the following diagram. In this post I continue with the material from Read More
Ancestry.com, like any other site with millions of subscribers, experiences predictable load patterns throughout the day. To maximize site performance and customer satisfaction, we make every effort to schedule maintenance during off-peak intervals. Content processing, especially our repository of hundreds of millions of images, on the other hand, is a constant ongoing effort, and in Read More
This post is the fifth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site. In this post we finally get to the good part – the part of the pipeline in which we process the Read More
This post is the fourth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site. In this post I will present a bit of information about our microfilm scanning process. Read More
This post is the third in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site. In part 1 of this series, The Good, the Bad, and the Ugly, I gave Read More
About 450 years ago John Heywood wrote, “many hands make light work.” The same can be said of image and data processing. Distributed parallel computing (DPC) makes it possible for us to do the work described by Michael Murdock in his series on the image processing pipeline. If you haven’t already, take a moment to Read More
Last week I began this series of blog posts about the Ancestry.com Image Processing Pipeline (IPP) by briefly describing how the IPP is the part of the Ancesty.com Content Pipeline that is responsible for digitizing and processing the millions of images we publish to our site. With this blog post I would like to Read More