This post is the fourth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site. In this post I will present a bit of information about our microfilm scanning process.
A high-level depiction of the IPP is shown in the following diagram. Scanning, shown in the dark blue box, is the first step in the pipeline and is the process by which we convert media (microfilm, microfiche, paper) into digital images.
The following photo panel shows a Mekel Mach V microfilm scanner on the left and on the right a strip of the microfilm as it streams past the camera’s CCD sensor. Although we more typically process 35 mm film, in this photo we are scanning 16 mm film.
The following photo is a composite showing in the left panel a roll of 35 mm microfilm. The film is shown in the right panel zoomed in to the first four frames on the film.
In the following photo I have zoomed in to the third frame on the film, with an inset panel showing the film next to a U.S. quarter, just for context.
The following screenshot of an image shows this microfilm frame as it appears on our web site. This image is part of the 1900 U.S. Federal Census and can be seen here with an Ancestry.com subscription.
We use Mekel scanners to digitize rolls of microfilm, which can contain anywhere from 300 to 25,000 frames, but more typically average about 1000 frames. A 1000-foot roll of film is scanned in about twelve minutes – We might choose to go slower if the operator needs more time to review the images; we might be forced to go slower if our internal network is congested, since we scan directly to network-attached storage devices. The Mekel scans produce images with a resolution of between 300 to 600 dpi, depending on the requirements of the particular project. This level of image resolution is possible because the scanner contains an 8,192 pixel CCD array that can scan between 80 and 160 megapixels per second. The internal pixel representation is a 12-bit grayscale depth, which allows for a tremendous amount of flexibility in adjusting the dynamic range for the conditions on the film.
The most interesting point here is that this process is creating fixed-sized image strips. In the past, the scanners we used would segment the frames from the film as it scanned. In other words, the scanner created the frames as it scanned and you were pretty much stuck with the segmentation it gave you. But with strip scanning the scanner produces fixed-sized strips and thus defers the segmentation to a subsequent framing step that is much more accurate in the way it identifies frames. More importantly, by deferring the segmentation we can involve a human reviewer who can be much more deliberate and thus more accurate in determining how the content on the film should be framed.
The relationship between strips and frames is shown in the following diagram. On the left of the diagram are the strips produced by the Mekel scanner. On the right of the diagram are the frames created from these strips.
In this example, a roll of microfilm was scanned into 1367 strips, each 4096 pixels high. After an operator reviews and fine-tunes the scanner-supplied segmentation, 1837 image frames were extracted by stitching together the appropriate strips.
You have probably never even once wished you knew more about microfilm scanning technology. Creating 35 mm rolls of microfilm is a nearly 80-year-old technology and microfilm scanners have been around for decades. But if you care (deeply) about producing high-quality images, getting this part of the process right is absolutely critical. Strip scanning is a fairly recent development, and the work we have done the last few years to do the stitching of strips into frames on our server farm has been something of a minor break-through, enabling the IPP to produce both higher volume and higher-quality images.
About Michael Murdock
Michael Murdock is a senior software development manager at Ancestry.com where he has worked for the last 9 years. He holds 8 patents in the areas of image and signal processing, and loves drinking 7-Up while thinking about the cool products he has helped create at the 6 companies he's worked for since graduating a long time ago from the University of Utah. He occasionally runs a 5K with one of his 4 children, recently finishing 3rd in his age group. He loves to read and found time to finish 2 books recently. He loves to travel with his wife, the 1 and only love of his life.