Posted by on April 10, 2013 in Image Processing and Analysis

 

Advertisement in the Piqua Leader- Dispatch, August 15, 1912

Images of original historical records play a key role in the way Ancestry.com presents family history information to the user. An image of a historical record is much more than evidentiary support for some family history assertion. An image can become the anchor for an engaging and compelling historical narrative. A properly captured and rendered image can be beautiful and even exciting. Something I love about Ancestry.com is getting to work with people who are passionate about images and share the drive to create world-class image processing technology.

 

My father’s name in the 1940 U.S. Census. This is a link to this collection, which is free to everyone.

 

This blog post is the first in a series of articles about the image processing technologies we have developed here at Ancestry.com. As a software development manager over the Imaging Development team my responsibility is to manage a group of software engineers as we create the technology and software applications to support our Image Processing Pipeline (IPP).

Before I start, let me explain a little about our image processing at Ancestry.com. Ongoing, our Content Acquisition team works with partners around the world (from the National Archives and Records Administration, to a small church in Italy) to procure new record collections, or what we call content, to be added to the website. Once we get our hands on these records – which could be in paper form, microfilm, digital form and the list goes on – we put them through our Content Pipeline to get them ready to be online and searchable no matter what form we get them in. This is where the Image Processing team comes in.

A very high-level view of the Content Pipeline is shown here in the following diagram:

 

The Ancestry.com Content Pipeline

 

As you can see, a main portion of the Content Pipeline is the Image Processing. This year we will use the IPP to scan and process around 200 million images, which requires some pretty cool technology that I hope to be able to describe in future blog posts.

 

A very high-level view of the  Image Processing Pipeline is shown here in the following diagram:

 

A very high-level view of the Image Processing Pipeline

 

Just to give you a taste of some of the images, or content, the Imaging Development team deals with, I’d like to quickly show you some examples of images.

The Good – Examples of Beautiful Images

Properly capturing, preserving and rendering images of historical records can be an extraordinarily difficult challenge. Beautiful historical images, like those shown below, don’t just happen naturally. In fact, with everything that can go wrong, it’s amazing they happen at all.

I have included here a few of my favorite images and invite and encourage you to share with us links to your favorites.

 

This image is from the following collection: Gretna Green, Scotland, Marriage Registers, 1794-1895.
(An Ancestry.com subscription is required to follow the links to the image and/or collection)

 

 

Italian Baptism (1776) and Burial (1598) Records

 

 

 

Italian Land Record (1682)


 

This image of a passport application came from the U.S. Passport Applications, 1795-1925 Collection. (An Ancestry.com subscription is required to follow the links to the image and/or collection)

 

In the next section I jump to the other extreme and show some bad and ugly images.

The Bad and the Ugly – Examples of Really Bad Images

Historical documents should be captured in-focus, properly oriented, in adequate lighting and digitized into an image at an appropriate resolution. However, and as you would expect, given the sheer number of possible problems in this processing sequence, as a paper-based historical record is converted to microfilm and then to a digital image, it’s not surprising that we encounter a large number of images that are barely legible, much less beautiful and engaging. To illustrate, I pulled together some snippets of images from a variety of collections with the kinds of problems we frequently encounter.

Collage of some really bad images we encounter in the Image Processing Pipeline

As illustrated in this collage, there are many, many ways for images to have problems that make them difficult to read or even use for family history research. They could be partially destroyed from a fire, copied lightly so text is barely readable, scanned off-center, photographed at an angle that skews the pages and many other unfortunate forms. Fortunately for everyone involved, these problem images are the exception, rather than the rule. But, unfortunately, the problems are not so rare that they can be ignored, nor treated individually as they are discovered.

In the next several blog posts in this series I will attempt to describe how we deal with problem images and how we use a variety of technologies to try to preserve and create beautiful images that can serve as the centerpiece for your historical narratives.

Additionally, in future blog posts in this series we will deal with the various components shown in the Image Processing Pipeline diagram, such as:

 

 

About Michael Murdock

Michael Murdock is a senior software development manager at Ancestry.com where he has worked for the last 9 years. He holds 8 patents in the areas of image and signal processing, and loves drinking 7-Up while thinking about the cool products he has helped create at the 6 companies he's worked for since graduating a long time ago from the University of Utah. He occasionally runs a 5K with one of his 4 children, recently finishing 3rd in his age group. He loves to read and found time to finish 2 books recently. He loves to travel with his wife, the 1 and only love of his life.


We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated.

Commenting is open until Wednesday, 24 April 2013