Images of original historical records play a key role in the way Ancestry.com presents family history information to the user. An image of a historical record is much more than evidentiary support for some family history assertion. An image can become the anchor for an engaging and compelling historical narrative. A properly captured and rendered image can be beautiful and even exciting. Something I love about Ancestry.com is getting to work with people who are passionate about images and share the drive to create world-class image processing technology.
This blog post is the first in a series of articles about the image processing technologies we have developed here at Ancestry.com. As a software development manager over the Imaging Development team my responsibility is to manage a group of software engineers as we create the technology and software applications to support our Image Processing Pipeline (IPP).
Before I start, let me explain a little about our image processing at Ancestry.com. Ongoing, our Content Acquisition team works with partners around the world (from the National Archives and Records Administration, to a small church in Italy) to procure new record collections, or what we call content, to be added to the website. Once we get our hands on these records – which could be in paper form, microfilm, digital form and the list goes on – we put them through our Content Pipeline to get them ready to be online and searchable no matter what form we get them in. This is where the Image Processing team comes in.
A very high-level view of the Content Pipeline is shown here in the following diagram:
As you can see, a main portion of the Content Pipeline is the Image Processing. This year we will use the IPP to scan and process around 200 million images, which requires some pretty cool technology that I hope to be able to describe in future blog posts.
A very high-level view of the Image Processing Pipeline is shown here in the following diagram:
Just to give you a taste of some of the images, or content, the Imaging Development team deals with, I’d like to quickly show you some examples of images.
The Good – Examples of Beautiful Images
Properly capturing, preserving and rendering images of historical records can be an extraordinarily difficult challenge. Beautiful historical images, like those shown below, don’t just happen naturally. In fact, with everything that can go wrong, it’s amazing they happen at all.
I have included here a few of my favorite images and invite and encourage you to share with us links to your favorites.
In the next section I jump to the other extreme and show some bad and ugly images.
The Bad and the Ugly – Examples of Really Bad Images
Historical documents should be captured in-focus, properly oriented, in adequate lighting and digitized into an image at an appropriate resolution. However, and as you would expect, given the sheer number of possible problems in this processing sequence, as a paper-based historical record is converted to microfilm and then to a digital image, it’s not surprising that we encounter a large number of images that are barely legible, much less beautiful and engaging. To illustrate, I pulled together some snippets of images from a variety of collections with the kinds of problems we frequently encounter.
As illustrated in this collage, there are many, many ways for images to have problems that make them difficult to read or even use for family history research. They could be partially destroyed from a fire, copied lightly so text is barely readable, scanned off-center, photographed at an angle that skews the pages and many other unfortunate forms. Fortunately for everyone involved, these problem images are the exception, rather than the rule. But, unfortunately, the problems are not so rare that they can be ignored, nor treated individually as they are discovered.
In the next several blog posts in this series I will attempt to describe how we deal with problem images and how we use a variety of technologies to try to preserve and create beautiful images that can serve as the centerpiece for your historical narratives.
Additionally, in future blog posts in this series we will deal with the various components shown in the Image Processing Pipeline diagram, such as:
- The Paper-to-Image process and what happens along the way to degrade image quality
- The automated and manual components in the Image Processing Pipeline
- The scale of our Image Processing Pipeline
- The high-level principles and guidelines that drive our technology development
- The image processing operations in our pipeline (leveling, sharpening, binarization, contrast enhancement, etc.)
About Michael Murdock
Michael Murdock is a senior software development manager at Ancestry.com where he has worked for the last 9 years. He holds 8 patents in the areas of image and signal processing, and loves drinking 7-Up while thinking about the cool products he has helped create at the 6 companies he's worked for since graduating a long time ago from the University of Utah. He occasionally runs a 5K with one of his 4 children, recently finishing 3rd in his age group. He loves to read and found time to finish 2 books recently. He loves to travel with his wife, the 1 and only love of his life.