Tech Roots » Image Processing and Analysis http://blogs.ancestry.com/techroots Ancestry.com Tech Roots Blogs Tue, 31 Mar 2015 14:55:00 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.2 Competition as Collaboration – Ancestry.com Handwriting Recognition Competitionhttp://blogs.ancestry.com/techroots/competition-as-collaboration-ancestry-com-handwriting-recognition-competition/ http://blogs.ancestry.com/techroots/competition-as-collaboration-ancestry-com-handwriting-recognition-competition/#comments Fri, 14 Mar 2014 17:27:03 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=2083 We are excited to announce that the Ancestry.com handwriting recognition competition proposal was accepted as one of seven, official International Conference on the Frontiers of Handwriting (ICFHR-2014) competitions. As part of our competition on word recognition from segmented historical documents, we are announcing the availability of a new image database1, ANWRESH-1, which contains segmented and labeled… Read more

The post Competition as Collaboration – Ancestry.com Handwriting Recognition Competition appeared first on Tech Roots.

]]>
We are excited to announce that the Ancestry.com handwriting recognition competition proposal was accepted as one of seven, official International Conference on the Frontiers of Handwriting (ICFHR-2014) competitions. As part of our competition on word recognition from segmented historical documents, we are announcing the availability of a new image database1, ANWRESH-1, which contains segmented and labeled documents for use by researchers in the document analysis community.

We invite you to visit our competition website to learn more about what the competition entails, prizes offered, and to register if you are interested. A few key dates to note:

  • Competition Registration Deadline: March 24, 2014
  • Submission Deadline: April, 1, 2014
  • Benchmark Database Availability: April 2, 2014
  • Results Announced: September 4, 2014

Read on to learn about the ICFHR conference, the Ancestry.com competition and database, and why we are so excited to be sponsoring this competition.

Since 1990 the document analysis research community has been meeting every two years for a series of conferences called ICFHW, the International Conference on the Frontiers of Handwriting Recognition

icfhr2014

Quoting from the ICFHR home page:

ICFHR is the premier international forum for researchers and practitioners in the document analysis community for identifying, encouraging and exchanging ideas on  the  state-of-the-art  technology  in  document  analysis,  understanding,  retrieval,  and  performance  evaluation.  The term  document  in  the  context  of  ICFHR  encompasses  a  broad  range  of  documents  from  historical  forms  such  as palm leaves and papyrus to traditional documents and modern multimedia documents. … The ICFHR provides a forum for researchers in the areas of on-line and off-line handwriting recognition, pen-based interface systems, form processing, handwritten-based digital libraries, and web document access and retrieval.

The format of the conference is fairly typical with a variety of pre-conference tutorials and the conference proper consisting of multiple parallel tracks of oral and poster presentations. A fairly modern innovation for these kinds of conferences is the inclusion of sponsored competitions that take place in the months leading up to the conference with the results announced and discussed (and in some cases, debated) in sessions on the last day of the conference.

The ANWRESH-1 Database

An important part of our competition is the new database, ANWRESH-1, that we are making available to the document analysis research community. We expect many in the research community will find it interesting and helpful in their work. It consists of about 800,000 “image snippets” of handwritten text drawn from about 4,000 images from the 1920 and 1930 U.S. Censuses. Specifically, we have located (segmented) on each image the Name, Relation, Age, Marital Condition, and Place of Birth fields and labeled them with their ground truth values. An example image is shown below in Figure 1. Note that I have shown in this figure one row (called a record), with each of the fields we are using in this competition labeled with its field type and highlighted in yellow.

Magnified Census - Scaled, Compressed

Figure 1. Example document with one row emphasized and the fields of interest highlighted in yellow.

 

The challenge in this competition is to use the ANWRESH-1 database to create field-specific recognizers that can take segmented image snippets of handwritten text in images and automatically transcribe (or assist with the transcription) to create the corresponding textual representations for these fields.

One possible approach for the Birth Place field that takes advantage of the repetition of values common in this kind of collection might be to develop a mathematical model that clusters the ink strokes in a snippet using some distance metric such that similar words (under this metric) belong to the same cluster. The following snippets would be “close together” under this metric and thus, would be in the same (green) cluster.

Green Cluster - Scaled, Compressed

This clustering algorithm wouldn’t have the slightest idea what characters are formed from the ink strokes, but it would know that the following snippets are different from the snippets in the green cluster (and thus belong together in the blue cluster):

Blue Cluster - Scaled, Compressed

This approach is very powerful when you encounter a document containing birthplace entries like the following:

Repetition of Birth Place Values

 

Once a human keyer identifies the very first occurrence as the text “alabama”2, the clustering algorithm can then automatically label the rest of the alabama fields as being similar or the same, which can then be quickly and easily reviewed by the human keyer. In some cases the repetition of field values could allow this kind of algorithm to reduce the number of fields that are required to be keyed by one or two orders of magnitude.

 

Is Competition a Good Thing?

One might ask what we hope to gain by sponsoring this competition. Developing and helping the document analysis community advance handwriting recognition technology is a strategic initiative for Ancestry.com. As we have discussed in previous blog posts, the process of converting images of historical documents of handwritten names, dates, relationships and places into a textual representation suitable for searching, is almost all done manually. This transcription process is expensive and time-consuming and is thus a limiting factor in large-scale efforts to extract the data contained in the vast libraries of archived historical documents. Considering the billions of valuable historical documents currently residing on microfilm, microfiche and paper, it’s clear that advancing the capabilities of handwriting recognition systems so as to be able to automate (or even partially automate) the transcription process could be hugely beneficial.

In sponsoring ANWRESH-2014, we are reaching out to researchers developing technologies in word recognition, word spotting, word clustering, machine learning and other related fields to encourage their participation and collaboration. Initially, we want our efforts in this area to generate interest and awareness, to foster connections and enable collaboration. We hope this competition and the ANWRESH-1 database will be an enabler for fresh, unconventional approaches in this difficult, multi-faceted problem. At the conclusion of the competition, at a minimum, we hope to have a much better understanding of the current state-of-the-art for systems for handwritten word recognition on historical documents. As we proceed beyond this competition, we anticipate a spectrum of innovative techniques will emerge. As a growing and diverse community uses increasingly larger, cleaner, and most importantly, shared databases of historical documents to help characterize these techniques, we will see real, albeit incremental, progress in these technologies that will enable us to unlock and make available valuable document collections to family historians that with today’s technologies are simply out of reach.

 

Notes:

1. The name of our competition and database, ANWRESH, stands for ANcestry.com Word REcognition from Segmented Historical Documents.

2. The lower-case “a” in “alabama” is because of our “key-as-seen” policy: If the text looks like a lower-case letter, that’s the way it is keyed.

 

 

The post Competition as Collaboration – Ancestry.com Handwriting Recognition Competition appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/competition-as-collaboration-ancestry-com-handwriting-recognition-competition/feed/ 0
Document Analysis and Recognition – What is Document Analysis?http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-document-analysis-and-recognition/ http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-document-analysis-and-recognition/#comments Sat, 05 Oct 2013 00:52:05 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=1188 I recently attended the three-day biennial International Conference on Document Analysis and Recognition (ICDAR-2013) in Washington, DC. ICDAR is sponsored by the International Association for Pattern Recognition and is the premier event for those working in the field of Document Analysis (DA). Primarily the attendees are from corporate and university labs; they are professors, graduate students and technologists involved… Read more

The post Document Analysis and Recognition – What is Document Analysis? appeared first on Tech Roots.

]]>
I recently attended the three-day biennial International Conference on Document Analysis and Recognition (ICDAR-2013) in Washington, DC. ICDAR is sponsored by the International Association for Pattern Recognition and is the premier event for those working in the field of Document Analysis (DA). Primarily the attendees are from corporate and university labs; they are professors, graduate students and technologists involved in businesses relating to DA technologies. Since DA is such an important set of technologies for our work in the Image Processing Pipeline (IPP), I have decided to interrupt my regularly scheduled series, Image Processing at Ancestry.com, and present a brief introduction to a few key concepts in DA. In my next post in this mini-series I will share with you some of what I learned at ICDAR-2013 and conclude with some of the factors I think have caused, or at least contributed to, the dramatic improvements we’ve seen in this field in the last few years. 

ICDAR-2013 Banner

What is Document Analysis (DA)?

Perhaps the best way to think about DA is to contrast image processing with image analysis. In the diagram in Figure 1, both the blue “Image Processing” block and the green “Image Analysis” block take an image as their input. Image processing applies operations to the input image to produce an output image that has been cropped, deskewed, scaled, or enhanced in some way. The point of image analysis, by way of contrast, is not to produce an output image, but to derive or extract information from the input image. It’s about trying to get at the semantics or extract the content from the image.

Image Processing vs. Image Analysis

Figure 1. Image Processing vs. Image Analysis

DA is a family of technologies shown in Figure 2 as green components and arranged in layers, the foundation of which is a set of blue image processing components, reflective of the fact that good image processing always precedes any kind of image analysis task.

Image Processing and Document Analysis Layers

Figure 2. Document analysis technologies shown as green components on a foundation of blue image processing components.

The technology components (mathematical operations and learning algorithms) involved in doing any kind of DA task might appear to be simple and straightforward. In fact, the individual components can actually be described quite succinctly. However, when you go a bit beneath the surface, things quickly get very murky. Implementing a system with some of the common DA technologies can turn out to be incredibly complex. Characterizing the component interactions and controlling their behavior under real-world conditions is anything but straightforward. This is why DA is considered to be a difficult and unsolved problem.

In spite of these challenges, and the complexities that make an attempt at a “brief overview of the technology” somewhat doomed from the start, I will focus this post on four principles that have helped guide our work in developing DA technology and applications at Ancestry.com. I think these principles get to the core of how to think about DA. Also, I believe these four principles are general enough that sharing them here might be a good way to introduce DA without presenting a bunch of equations or pseudo-code.

Principle 1: The text assumption. DA is sub-field of image analysis in which the techniques we use assume that the image contains text or text-like structure in table form. This text assumption is a pretty strong constraint. But to the extent that it’s valid for your domain, it dramatically simplifies all aspects of the analysis. We are expecting words, not a photograph of a flower, a person’s face, or a topographical map. Documents have regularities that can be learned. Natural images do not. Since we are dealing with text, there are a myriad of tools available for processing this text. For instance, in 2006, Google released a one trillion word corpus with frequency counts for all sequences up to five words. Documents allow for language models, which can guide how your system processes and interprets the various low-level probabilities from stroke and character classifiers.

Principle 2: Consider content and structure. Documents are typically structured according to some hierarchy – a page is composed of regions, each of which have zones and/or fields. The content that we wish to identify and extract is embedded in this structure. Consider, for example, the death record in Figure 3 below. It is extremely difficult to create an algorithm that can make much sense of this document, much less be able to accurately extract the fields of interest without first building a model of, and then traversing through the document’s structure.

Death Certificate - Scaled, Compressed

Figure 3. Death certificate as an example of a document in which you must understand the structure in order to extract the content.

The content and structure principle guides how we view or evaluate component or system-level functionality: these two aspects must not be blurred, or worse, treated as if they are similar kinds of things. The content in a document is embedded in structure, but content is not like structure. Any real DA based system must have components for dealing with both the content and the structure.

Principle 3: Content is bottom-up hierarchical. Almost without exception, the content of interest in a document is comprised of text that is built up from a hierarchy. Content is extracted from the bottom-up. The bottom of the hierarchy is a component, which is a small piece of ink from a pen stroke, part of a machine-printed character, or part of a machine-printed line. The diagram in Figure 4 illustrates the bottom-up hierarchy for some example content from a document. In words, this diagram says the following: components combine into primitives which are organized into words, which are organized into phrases. The content hierarchy is bottom-up because that approach has proven to be the most reliable way to find and extract content. Start with a search for components that form high-likelihood primitives (characters), which are then used to form candidate words and phrases, which are scored against runner-up primitive combinations.

Content Elements - Scaled

Figure 4. The content elements form a bottom-up hierarchy.

A zoomed-in snippet shown in Figure 5 contains strokes and lines, which unfortunately don’t play well together. This figure illustrates how stroke components combine to form characters (numerals). But you can also see how machine-printed lines might interfere with an algorithm attempting to isolate a connected component to form a character hypothesis. Whenever possible, lines are subtracted out of a region before an attempt is made to isolate characters for recognition. But in general, stroke detection (determining a character from one or more components), is a hard problem that currently for handwritten character strokes is not even close to being accurate enough to be practical to deploy in a real DA system.

25 Strokes - Scaled

Figure 5. Zoomed-in snippet containing strokes and lines.

Principle 4: Structure is top-down hierarchical. Almost without exception, the structure of a document is hierarchical. This structure is imposed from the top down. Structure helps locate or interpret the content to be extracted. The diagram in Figure 6 illustrates how the structure is a top-down hierarchy. In words, this diagram says the following: a page consists of regions, which each consist of zones, which each consist of fields. This structure is modeled as a top-down process because that is the only approach that has been demonstrated to be effective on a variety of document types. A page is segmented into homogeneous regions (sometimes with the help of separator lines, regular white space, or other similar cues) that each lead to high-likelihood zones and fields.

Structure Elements - Scaled

Figure 6. The structure elements form a top-down hierarchy.

Two snippets from the death certificate in Figure 3 are shown below in Figure 7 to illustrate these two hierarchies. The content item (orange) are the information items we wish to extract from the document and the structure items (green) are used to locate or interpret the content elements.

Annotated Death Cert - Scaled

Figure 7. Two snippets from a death certificate that illustrate the content (orange) and the structure (green).

As a final observation, I include an example of a common error in structure detection. The zoomed-in snippet in Figure 8 illustrates one of the challenges in detecting regions. Notice that the text in the red box is tightly clustered in a region bound on the left with what could be separator lines, but the region shown in the red box is completely wrong. The correct region is shown with the green box.

Table Detection Error - Scaled

Figure 8. An example of how an incorrect segmentation (red box) might group all of the names into a single region. Correct segmentation (green box) properly group fields in a row.

The source of this region detection problem was its failure to detect that this snippet is really a table. To a human observer it’s clear that the machine-printed text at the top of the snippet is a set of column headers. When viewed as a table with records represented by rows and the fields of the record represented by columns, then the task is fairly straightforward to automatically extract the fact that Ella Flynn is single and arrived in the U.S. on Feb. 1, 1872. Of course this assumes that a character recognizer can correctly extract the handwritten name Ella Flynn and the handwritten date, Feb. 1, 1872. Unfortunately, this is beyond the capabilities of the current, state-of-the-art stroke recognition systems.

In my next post I will present what I learned from ICDAR-2013 with an eye towards assessing the current state of DA technologies.

The post Document Analysis and Recognition – What is Document Analysis? appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-document-analysis-and-recognition/feed/ 1
Image Processing at Ancestry.com – Part 6: Auto-Sharpeninghttp://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-6-auto-sharpening/ http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-6-auto-sharpening/#comments Wed, 17 Jul 2013 08:12:48 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=888 This post is the sixth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  The core functionality of the IPP is illustrated in the following diagram. In this post I continue with the material from… Read more

The post Image Processing at Ancestry.com – Part 6: Auto-Sharpening appeared first on Tech Roots.

]]>
This post is the sixth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  The core functionality of the IPP is illustrated in the following diagram.

ImageProcessingFlow - Scaled

Figure 1. Sequence of image processing operations performed in the IPP

In this post I continue with the material from my previous post (part five) in which I described some of the core image processing operations in the IPP, shown in Figure 1 in the box with the red outline. A source image, shown at the top of the diagram, is processed by the Image Processor, which creates a “recipe” file of the operations to be applied to the source image, such as auto-normalization and auto-sharpening. This step is followed by a manual step, Image Quality Editor, in which an operator manually inspects, and if necessary, corrects the image for things like brightness and contrast. This step is followed by the Image Converter, which applies the “recipe” instructions from the two previous steps to the source image and then compresses the image to the desired encoding and file container (such as JPG or J2K).

The objective of the IPP, you might recall, is to enhance the images in a way that generally improves legibility without inadvertently introducing damaging artifacts. In part five, I focused on image contrast  describing histograms as a way to (qualitatively) measure contrast and a technique called auto-normalization as a way to enhance the contrast in an image. This blog post presents an image processing technique called auto-sharpening that attempts to enhance the image. But instead of changing the contrast in the image, it enhances the image by removing some of the blur in the content.

Sharpening, in general, refers to a process that attempts to invert the (usually slight) blurring effects introduced into the image by the camera sensor and lens. Our goal in sharpening an image is to reveal some of the fine details in the text that might not be clearly or easily discernible in the original image. The “auto” part of the name is used to emphasize that the technique is done automatically, based on an algorithmic analysis of the image, and not through our previous manual inspection and sharpening correction of the image by a human operator.

Auto-sharpening, in the most general sense, works by amplifying the high-frequency components of the image. This Wikipedia article on the unsharp mask describes the basis of the algorithm we have developed and fine-tuned for the kinds of historical records we process. The edges of text are high-frequency components and by amplifying these edges, we can make the text more pronounced or sharp. However, for everything else in the image (that’s not text) this technology can introduce unwanted and conspicuous effects that make the image appear noisy, which is why we’ve developed an algorithm to allow for a parameter to control how aggressively it sharpens the image. Level-1 sharpening is the least aggressive and the sharpening effect is barely noticeable, while Level-4 sharpening is the most aggressive and for most images introduces too many artifacts. Although we do occasionally use Level-4 sharpening on some very faded and blurred images, Level-2 is our default level of sharpening.

The following diagram shows a snippet of an image that has been processed with these four different sharpening levels.

Snippet sharpened at four levels - ver2

Figure 2. A snippet of an image that is auto-sharpened at the four levels

Auto-sharpening, as mentioned above, can come with some negative side-effects. It works by exaggerating the brightness difference along edges, which creates the appearance of making these edges more pronounced or sharp. However, applying too much sharpening to an image can damage it by introducing an artifact called a “sharpening halo”. This can be seen in the following two figures in which the dark pixels from the text appear to have a glowing halo.

Figure 3. Side-by-side comparison of a snippet of an image that has been overly-sharpened.

Figure 3. Side-by-side comparison of a snippet of an image that has been overly-sharpened.

 

Figure 4. A part of a character from a level-4 sharpened image that illustrates the sharpening halo effect.

Figure 4. A part of a character from a level-4 sharpened image that illustrates the sharpening halo effect.

It’s clear from Figures 3 and 4 that Level-4 sharpening is too much for this image, since you can see the “halo” around the ink strokes. Our default sharpening setting is Level-2, and usually produces excellent results. Level-2 sharpened images just appear to be a bit more crisp, which almost always means the text is more legible.

In the last figure in my part five blog post I showed a zoomed-in snippet of an image before and after it was auto-normalized. The following figure shows this snippet after is has had Level-2 sharpening applied to it after the auto-normalization operation.

Figure 5. Image snippet comparison showing the benefits of auto-normalization followed by Level-2 sharpening.

Figure 5. Image snippet comparison showing the benefits of auto-normalization followed by Level-2 sharpening.

This comparison demonstrates the benefits of first, auto-normalizing the image to enhance its contrast and then following this operation with a cautious (Level-2) sharpening. Although the noise is slightly amplified in this image, it seems to be a reasonable trade for the improved legibility.

In my next blog post I will continue with the core functionality of the Image Processing Pipeline by presenting our approach to noise removal and image binarization.

 

The post Image Processing at Ancestry.com – Part 6: Auto-Sharpening appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-6-auto-sharpening/feed/ 0
Throttling Image Processinghttp://blogs.ancestry.com/techroots/throttling-image-processing/ http://blogs.ancestry.com/techroots/throttling-image-processing/#comments Fri, 21 Jun 2013 14:30:05 +0000 Tyler Jensen http://blogs.ancestry.com/techroots/?p=760 Ancestry.com, like any other site with millions of subscribers, experiences predictable load patterns throughout the day. To maximize site performance and customer satisfaction, we make every effort to schedule maintenance during off-peak intervals. Content processing, especially our repository of hundreds of millions of images, on the other hand, is a constant ongoing effort, and in… Read more

The post Throttling Image Processing appeared first on Tech Roots.

]]>
Ancestry.com, like any other site with millions of subscribers, experiences predictable load patterns throughout the day. To maximize site performance and customer satisfaction, we make every effort to schedule maintenance during off-peak intervals.

Content processing, especially our repository of hundreds of millions of images, on the other hand, is a constant ongoing effort, and in some cases must be done on live content being served up to our customers. One example of this occurs when we roll an improved set of images for a given collection, such as the 1921 Census of Canada, to the live site. Many of these images may have different dimensions than the originally published images. To be sure we get it right, we double check every image in the collection.

Until now, this work was done with a desktop tool that was effective but could take days to complete its work on very large collections. In order to speed this up, the Enterprise Media Team’s distributed computing initiative created a new service that uses a light weight, open source distributed computing framework called DuoVia.MpiVisor, a project led by this author outside of his regular Ancestry.com responsibilities, to distribute the work on five servers with a total of 64 logical processors.

Distributing the work on 64 logical processors was enormously successful, verifying up to 50,000 images’ dimensions every minute. The challenge was that if we were to allow content management access to this very powerful tool at any time during the day, there was a distinct possibility that it would affect the performance of our live site, something we wanted very much to avoid.

To throttle the new image dimension populating (IDP) service, we created three time zones to define high, medium and low traffic periods during the day. During high traffic periods, we only allowed one third of the processing agents to be given work. And during medium traffic periods, only one half of the available processing agents are used. Of course, during off-peak periods, all available agents are utilized.

In the weeks since the IDP service launched, it has processed over 130 million images in just over 6,700 run-time minutes. That is a throttled average of about 19,000 images processed per minute of processing time, far below its current max potential of 50,000 per minute.

By throttling the work, the IDP service remains responsive during peak traffic times without impacting the customer experience, allowing content teams to continue working to deliver the best images as soon as humanly possible to our customers.

The post Throttling Image Processing appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/throttling-image-processing/feed/ 1
Image Processing at Ancestry.com – Part 5: Auto-Normalizationhttp://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-5-auto-normalization/ http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-5-auto-normalization/#comments Tue, 28 May 2013 21:33:47 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=688 This post is the fifth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  In this post we finally get to the good part – the part of the pipeline in which we process the… Read more

The post Image Processing at Ancestry.com – Part 5: Auto-Normalization appeared first on Tech Roots.

]]>
This post is the fifth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  In this post we finally get to the good part – the part of the pipeline in which we process the images.

The purpose of the IPP is to correct and enhance the images in a way that improves their legibility without, in the process, inadvertently damaging other images. In this and the next couple of posts I will present some details on the kinds of processing we do in the Image Processor and how these operations help improve legibility.

The diagram in the following figure shows the context for the Image Processor.

Figure 1. Context diagram for the Image Processor component of the IPP.

The Image Processor consumes a source image from the scanner, performs a number of operations on the image and then saves out the processed image along with a thumbnail and a zoomed-in snippet, which are used in the subsequent Image Quality Editor step to review the quality of the image and, if necessary, make changes to fix any problems the operator finds.

Image Histograms

To illustrate one of the important operations performed by the Image Processor (auto-normalizing), we will use an image histogram, which is a simple graphical tool to help analyze the distribution of pixel values of an image. It gives you a sense for the frequency with which each grayscale value occurs in the image. On the x-axis are the 255 possible grayscale values (zero corresponds to black; 255 corresponds to white). The height of the vertical line at each of these 255 values corresponds to the frequency of that pixel value in the image.

We first consider a portrait photograph. In most natural grayscale images you would expect to see most or all 255 levels. The following “Lena” test image uses most of the available range. It uses 241 gray levels – It’s only missing the extreme white pixels, which I have indicated with the yellow shading on the histogram.

 

Figure 2. The standard Lena test image with its grayscale histogram showing the distribution of pixel values.

In images of historical records, such as in the following figure, you hope for a bimodal distribution, one peak (on the left) corresponding to the dark textual content and the other peak (on the right) corresponding to the light background.

 

Figure 3. An example image with a strongly bimodal distribution.

Although there are some pixels with values in the midrange, almost all of this image is concentrated at the white extreme (255). I have highlighted in yellow the two extreme ends of the histogram. It’s not clear from this histogram, but it’s the black border that allows the histogram to even show anything at the black extreme. If you crop off this black border, 92% of the pixels have a value of 253, 254, or 255. This image contains a little bit of gray ink on a very white background.

 

Image Contrast

Although you would hope in the previous image to have more black (or dark gray) pixels coming from the printed text, that is an example of an image having pretty good contrast. Contrast is the difference in luminance between, in this case, the printed text and the page background. If the pixels corresponding to the black printed text are near zero, and the pixels corresponding to the white background are near 255, then the image is said to have high contrast, which is a good thing, since it helps with legibility.

Now consider the following more typical image of a historical record:

 

Figure 4. An example image with low contrast as indicated in the histogram by a very compressed range of pixel values 

Notice that the pixel values are more concentrated into the mid-section of the allowed range. Without a bimodal distribution, or a least more spread in the distribution, the printed text and the document background share roughly the same grayscale values. This image is said to have low contrast, which is a bad thing, since this makes the printed text blend into the background, making it difficult to extract that text either by a human or with an image analysis tool.

 

Auto-Normalization

An important operation that helps improve the contrast in an image is called Auto-Normalization. The goal of auto-normalization is to improve the contrast of the image by “stretching” the range of intensity values to span a greater range of (luminance) values. Auto-normalization is performed on every image we process. It’s a linear, lossless operation that can take a low-contrast image and make it more legible. On images that have good contrast to begin with, auto-normalization has little to no effect.

The following figure shows the result of auto-normalizing the previous (low-contrast) image.

 

Figure 5. The auto-normalized output of the Image Processor with the histogram showing a greater spread in the luminance distribution.

The following figure shows zoomed-in snippets before and after the image was auto-normalized.

 

Figure 6. Zoomed-in snippets showing the improvement to the contrast due to the auto-normalizing operation in the ImageProcessor.

From the previous two figures it’s quite clear that the auto-normalizing operation is having the intended effect of increasing contrast in the image and thus (somewhat) improving the legibility of the text by using more of the dynamic range available in the image’s luminance values.

In my future posts I will present other operations in the Image Processor that work in concert with our auto-normalization algorithm to help improve the quality of the images we process.

 

 

The post Image Processing at Ancestry.com – Part 5: Auto-Normalization appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-5-auto-normalization/feed/ 0
Image Processing at Ancestry.com – Part 4: Microfilm Scanninghttp://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-4-microfilm-scanning/ http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-4-microfilm-scanning/#comments Mon, 13 May 2013 05:15:49 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=634 This post is the fourth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  In this post I will present a bit of information about our microfilm scanning process.… Read more

The post Image Processing at Ancestry.com – Part 4: Microfilm Scanning appeared first on Tech Roots.

]]>
This post is the fourth in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  In this post I will present a bit of information about our microfilm scanning process.

A high-level depiction of the IPP is shown in the following diagram. Scanning, shown in the dark blue box, is the first step in the pipeline and is the process by which we convert media (microfilm, microfiche, paper) into digital images.

The Image Processing Pipeline – The scanning process is highlighted in the dark blue box.

The following photo panel shows a Mekel Mach V microfilm scanner on the left and on the right a strip of the microfilm as it streams past the camera’s CCD sensor. Although we more typically process 35 mm film, in this photo we are scanning 16 mm film.

Mekel Mach V microfilm scanner

The following photo is a composite showing in the left panel a roll of 35 mm microfilm. The film is shown in the right panel zoomed in to the first four frames on the film.

A roll of microfilm. The right panel shows a zoomed-in portion of the film.

In the following photo I have zoomed in to the third frame on the film, with an inset panel showing the film next to a U.S. quarter, just for context.

Zoomed-in photograph of a single frame on a microfilm roll. Inset shows size relative to a U.S. quarter.

The following screenshot of an image shows this microfilm frame as it appears on our web site. This image is part of the 1900 U.S. Federal Census and can be seen here with an Ancestry.com subscription.

Image from the 1900 U.S. Federal Census corresponding to the microfilm frames shown above.

We use Mekel scanners to digitize rolls of microfilm, which can contain anywhere from 300 to 25,000 frames, but more typically average about 1000 frames. A 1000-foot roll of film is scanned in about twelve minutes – We might choose to go slower if the operator needs more time to review the images; we might be forced to go slower if our internal network is congested, since we scan directly to network-attached storage devices. The Mekel scans produce images with a resolution of  between 300 to 600 dpi, depending on the requirements of the particular project. This level of image resolution is possible because the scanner contains an 8,192 pixel CCD array that can scan between 80 and 160 megapixels per second. The internal pixel representation is a 12-bit grayscale depth, which allows for a tremendous amount of flexibility in adjusting the dynamic range for the conditions on the film.

The most interesting point here is that this process is creating fixed-sized image strips. In the past, the scanners we used would segment the frames from the film as it scanned. In other words, the scanner created the frames as it scanned and you were pretty much stuck with the segmentation it gave you. But with strip scanning the scanner produces fixed-sized strips and thus defers the segmentation to a subsequent framing step that is much more accurate in the way it identifies frames. More importantly, by deferring the segmentation we can involve a human reviewer who can be much more deliberate and thus more accurate in determining how the content on the film should be framed.

The relationship between strips and frames is shown in the following diagram. On the left of the diagram are the strips produced by the Mekel scanner. On the right of the diagram are the frames created from these strips.

Diagram illustrating the relationship between image strips and image frames.

In this example, a roll of microfilm was scanned into 1367 strips, each 4096 pixels high. After an operator reviews and fine-tunes the scanner-supplied segmentation, 1837 image frames were extracted by stitching together the appropriate strips.

You have probably never even once wished you knew more about microfilm scanning technology. Creating 35 mm rolls of microfilm is a nearly 80-year-old technology and microfilm scanners have been around for decades. But if you care (deeply) about producing high-quality images, getting this part of the process right is absolutely critical. Strip scanning is a fairly recent development, and the work we have done the last few years to do the stitching of strips into frames on our server farm has been something of a minor break-through, enabling the IPP to produce both higher volume and higher-quality images.

 

The post Image Processing at Ancestry.com – Part 4: Microfilm Scanning appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-4-microfilm-scanning/feed/ 1
Image Processing at Ancestry.com – Part 3: Where Do Images Come From?http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-3-where-do-images-come-from/ http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-3-where-do-images-come-from/#comments Fri, 03 May 2013 15:25:26 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=580 This post is the third in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  In part 1 of this series, The Good, the Bad, and the Ugly, I gave… Read more

The post Image Processing at Ancestry.com – Part 3: Where Do Images Come From? appeared first on Tech Roots.

]]>
This post is the third in a series about the Ancestry.com Image Processing Pipeline (IPP). The IPP is the part of the content pipeline that is responsible for digitizing and processing the millions of images we publish to our site.  In part 1 of this series, The Good, the Bad, and the Ugly, I gave a very general overview of IPP. In part 2, Living in the Mesosphere, I used a paper-stacking analogy to provide a sense for the volume of images we process in the IPP. In this post I describe at a very high level the challenges related to the digitizing part of the pipeline and in future posts I will delve more deeply into the processing part of the pipeline.

The following diagram shows the sequence of operations that occur in converting a paper document to a digital image.

 

Flowchart of the process of converting a paper document into a digital image. Note that two “capture” steps can happen: The paper is captured to film and then the film is scanned into a digital image.

In this multi-step process of converting a paper document into a digital image, there are a number of “destructive operations” that can and frequently do occur, which leaves the image with artifacts that make the content difficult to read. Some of these destructive operations are listed in the following diagram.

 

A list of some of the destructive operations that happen as a paper document is converted into a digital image.

After considering all of the destructive operations that can happen as a paper document is converted into a digital image, the obvious questions are: (1) Is it possible to undo this damage? And if so, (2) How would one go about undoing this damage to make the content in these images more legible?

The process of trying to correct or compensate for these kinds of (destructive) operations is referred to as an “Inverse Problem“. Inverse problems occur in various fields of applied mathematics – they are well-studied, notoriously difficult, and can roughly be described as follows.

We start with an observation (the digital image) and we attempt to invert or reverse the effects of operations, which we cannot directly observe. But worse, these operations are non-linear and compounding with parameters that can’t be measured with any degree of accuracy. In our specific case this means that there is no generalized, closed-form solution to make faded, skewed, warped, low-contrast images look new again. Inverse theory provides little in the way of specific, practical guidance, but it does suggest a set of strategies (that have been shown to be successful in other fields) which we have attempted to follow in the IPP:

 

  1. Constrained. If we constrain ourselves to only attempt to optimize for a single aspect of the problem, which in our case is legibility, we are much more likely to arrive at acceptable results. However, this could, as an example, result in an image with handwriting or machine printed text that is generally deemed to be more legible but at the cost of a darkened or discolored image.
  2. Approximate. Inverse problems are considered ill-posed, which means that approximate solutions are the best we can hope for. A practical example of this (which will be discussed in detail in a future post) is to be very conservative in the use of an operation like sharpening. Attempting to get the parameters necessary to do an exact sharpening operation is likely to backfire and damage as many images as it helps.
  3. Adaptive. Instead of attempting to use static, global parameters, we allow our operations to be locally-adaptive, which means the parameters driving a particular operation, say, an auto-crop, are determined by local, image-specific measurements. This is more computationally expensive, but as a guiding principle it has shown to be the correct approach.

 

The image content that Ancestry.com makes available on our site typically arrives at our facilities as film or paper. We scan this media into digital images and then attempt to correct all of the damage that was done before it arrived to be digitized. Hopefully I’ve shed some light on the myriad of things that can damage the content in the early stages of its conversion from paper, and in general terms, provided the context for the processing we do in the IPP, which I will discuss in more detail in my future posts.

 

Credits

In both diagrams in this blog post I include an image of a roll of microfilm. Ianaré Sévi for Lorien Technologies is the copyright holder for this image. He has made it available under the Creative Commons Attribution-Share Alike 2.5 Generic license. Also in both of these diagrams I include two images, one labeled “Paper” and the other labeled “Digital Image”. These two images were extracted from an image that is used courtesy of James Tanner, http://genealogysstar.blogspot.com

 

 

The post Image Processing at Ancestry.com – Part 3: Where Do Images Come From? appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-3-where-do-images-come-from/feed/ 0
Distributed Parallel Computing at Ancestry.comhttp://blogs.ancestry.com/techroots/distributed-parallel-computing-at-ancestry-com/ http://blogs.ancestry.com/techroots/distributed-parallel-computing-at-ancestry-com/#comments Wed, 24 Apr 2013 23:03:40 +0000 Tyler Jensen http://blogs.ancestry.com/techroots/?p=535 About 450 years ago John Heywood wrote, “many hands make light work.” The same can be said of image and data processing. Distributed parallel computing (DPC) makes it possible for us to do the work described by Michael Murdock in his series on the image processing pipeline. If you haven’t already, take a moment to… Read more

The post Distributed Parallel Computing at Ancestry.com appeared first on Tech Roots.

]]>
About 450 years ago John Heywood wrote, “many hands make light work.” The same can be said of image and data processing. Distributed parallel computing (DPC) makes it possible for us to do the work described by Michael Murdock in his series on the image processing pipeline. If you haven’t already, take a moment to read his excellent posts.

At Ancestry.com we use a DPC system developed in-house that we call “iFarm.” We also use more recognizable DPC systems such as Hadoop for some things, but our primary image processing pipeline, described by Michael, runs on the iFarm.

The iFarm’s Client Controller allows us to monitor and control the servers and task agents in the “farm” of servers processing tasks. It also allows us to roll new task code to each of the client nodes when a change is made to the code.

iFarm Client Controller

The iFarm Client Controller – Allows us to manage servers and agents remotely.

In addition to the  image processing pipeline, and as the need arises, the Enterprise Media Team (EMT) creates and runs a series of image and data correction modules on already published images and data. We call this series of modules the Media Validation Processor (MVP). Probably the most significant MVP module is our Deep Zoom pre-processing module.

About 18 months ago Ancestry.com introduced its Deep Zoom image viewing technology. This allows our users to zoom in and out on hard to read, historical records or images in a record collection, such as the 1940 Census, with very little if any delay in loading the image. In order to achieve best performance results, this technology requires that the original image be specially processed into what we call “tiles.”

Viewing Deep Zoom processed images can be rather CPU intensive for the application server. This processing burden can be reduced greatly when the image has been pre-processed into tiles. The image processing pipeline automatically performs Deep Zoom pre-processing on new collections and updates to existing collections. But that leaves hundreds of millions of images that have not been pre-processed because they were published previous to the release of our Deep Zoom technology.

This is where the MVP Deep Zoom modules running on multiple agents across multiple server nodes recently came into play. Even with multiple iFarm server nodes and many agents running 24/7, the pre-processing of images for Deep Zoom in our top 500 most actively used collections required several months to complete. If not for the advantages of DPC in our iFarm system, this project could have taken years to complete. Eventually all of our collection titles will be pre-processed for Deep Zoom using iFarm.

If Heywood were a TechRoots blogger today, he would write, “Many CPUs make light work.” At Ancestry.com we are always looking for ways to achieve more in less time using the power of distributed parallel computing.

The post Distributed Parallel Computing at Ancestry.com appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/distributed-parallel-computing-at-ancestry-com/feed/ 0
Image Processing at Ancestry.com – Part 2: Living in the Mesospherehttp://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-2-living-in-the-mesosphere/ http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-2-living-in-the-mesosphere/#comments Sat, 20 Apr 2013 01:05:58 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=495   Last week I began this series of blog posts about the Ancestry.com Image Processing Pipeline (IPP) by briefly describing how the IPP is the part of the Ancesty.com Content Pipeline that is responsible for digitizing and processing the millions of images we publish to our site. With this blog post I would like to… Read more

The post Image Processing at Ancestry.com – Part 2: Living in the Mesosphere appeared first on Tech Roots.

]]>
 

Last week I began this series of blog posts about the Ancestry.com Image Processing Pipeline (IPP) by briefly describing how the IPP is the part of the Ancesty.com Content Pipeline that is responsible for digitizing and processing the millions of images we publish to our site.

With this blog post I would like to discuss the scale of our operation and hopefully give you a sense for the number of images that we handle in the IPP in a way that is instructive. But more importantly, before you can understand the IPP architecture and the trade-offs we have made in building it out (which I will discuss in future blog posts), you need to understand the sheer volume of images that we process.

 

Living in the Mesosphere

If you go look at the Ancestry.com Card Catalog, it’s pretty clear that we give access to a lot of content. It shows that you can access (through the filtered search panels on the left side of the page) over 30,000 titles.

 

The Ancestry.com Card Catalog gives you access to more than 30,000 databases (collections of records)

 

If you look at the second title listed above in the screenshot of the Card Catalog, you will see listed the 1940 U.S. Federal Census. This record collection consists of about 4660 rolls of microfilm, digitized to about four million high-resolution images, all of which were brought online in only a few days. By any measure, that’s a lot of images and a fast turnaround.

Over the last several years Ancestry.com has been very aggressive about: 1) bringing together teams of professionals who are passionate about delivering great content to our subscribers, and then, 2) pushing these teams to create technology, processes, software architectures and applications that make it possible to acquire and process this content at an unprecedented scale and scope. I see this in the works every day, but probably never at such scale as our effort last year to publish the 1940 U.S. census online. And that’s just one of the more than 30,000 collections available on our site.

In his 2013 RootsTech keynote address, Tim Sullivan mentioned that last year we added 1.7 billion records to our site for a total of 11 billion records. He then added that we’ll be investing even more money into adding content online – $100 million dollars over the next 5 years. That’s a lot of images.

To illustrate the scale and volume of images being processed at Ancestry.com, I like to use an analogy relating an image to a sheet of paper. Consider the following figure in which I suggest you think of an image as a single sheet of paper. If we stack about 500 sheets, a single ream of paper, it makes a stack about two inches tall.

 

 

Now stack about 25 reams on top of each other, as shown in the above figure, and you have a four-foot visual representation for what 12,500 images might look like.

Continuing with this analogy, what would the stack of “images” look like for the 1851 UK Census? As shown in the following figure, this collection of almost one million images would be about the height of the Statue of Liberty.

 

 

How about all of the 1851 to 1901 UK Censuses? If you were to stack each “image” from all of these censuses as individual sheets of paper, how high would this stack reach? The figure below shows that this stack extends to 2,133 feet, significantly higher that the Sears Tower in Chicago, Illinois.

 

 

Finally, extending this analogy to its limits, if you were to stack every image (as a sheet of paper) from all of our collections, how high would this stack reach? In the figure below I show that it reaches up to a height of over 166 thousand feet. This is a stack of images 31 miles high, which puts you well into the mesosphere. We’re adding roughly another half of a mile every month, so it will be a bit before we make it into the thermosphere.

 

 

So, what’s the point? Are these paper-stacking comparisons just an over-the-top, poorly-disguised attempt to brag? Probably. But for a purpose. It would be easy to misunderstand my intent in presenting these numbers and comparisons.

Here’s my thinking. The size of Ancestry.com’s image database is truly extraordinary. We realize that processing millions of images a week could cause conventional technologies to break down and most off-the-shelf approaches don’t even get you in the ballpark. Many people have worked several years to develop the technology and infrastructure for this valuable resource. Furthermore, we now have in place the technology and tools to improve this resource at an even accelerated rate. So while we’re focused on acquiring, digitizing, indexing, and then publishing as much quality historical content as we can at an accelerated scale, we’re doing it with the ideal technology for the mass amounts of data and images we are handling.

As I stated at the start of this blog post, and as a unifying theme in my subsequent posts, we have had to think about things in a whole new light in order to operate at this scale. How this “thinking in a whole new light” actually got turned into a working pipeline will be covered in my future blog posts.

 

 

The post Image Processing at Ancestry.com – Part 2: Living in the Mesosphere appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/image-processing-at-ancestry-com-part-2-living-in-the-mesosphere/feed/ 0
Image Processing at Ancestry.com – Part 1: The Good, The Bad, and the Uglyhttp://blogs.ancestry.com/techroots/our-image-processing-pipeline-the-good-the-bad-and-the-ugly/ http://blogs.ancestry.com/techroots/our-image-processing-pipeline-the-good-the-bad-and-the-ugly/#comments Wed, 10 Apr 2013 20:13:19 +0000 Michael Murdock http://blogs.ancestry.com/techroots/?p=221   Images of original historical records play a key role in the way Ancestry.com presents family history information to the user. An image of a historical record is much more than evidentiary support for some family history assertion. An image can become the anchor for an engaging and compelling historical narrative. A properly captured and… Read more

The post Image Processing at Ancestry.com – Part 1: The Good, The Bad, and the Ugly appeared first on Tech Roots.

]]>
 

Advertisement in the Piqua Leader- Dispatch, August 15, 1912

Images of original historical records play a key role in the way Ancestry.com presents family history information to the user. An image of a historical record is much more than evidentiary support for some family history assertion. An image can become the anchor for an engaging and compelling historical narrative. A properly captured and rendered image can be beautiful and even exciting. Something I love about Ancestry.com is getting to work with people who are passionate about images and share the drive to create world-class image processing technology.

 

My father’s name in the 1940 U.S. Census. This is a link to this collection, which is free to everyone.

 

This blog post is the first in a series of articles about the image processing technologies we have developed here at Ancestry.com. As a software development manager over the Imaging Development team my responsibility is to manage a group of software engineers as we create the technology and software applications to support our Image Processing Pipeline (IPP).

Before I start, let me explain a little about our image processing at Ancestry.com. Ongoing, our Content Acquisition team works with partners around the world (from the National Archives and Records Administration, to a small church in Italy) to procure new record collections, or what we call content, to be added to the website. Once we get our hands on these records – which could be in paper form, microfilm, digital form and the list goes on – we put them through our Content Pipeline to get them ready to be online and searchable no matter what form we get them in. This is where the Image Processing team comes in.

A very high-level view of the Content Pipeline is shown here in the following diagram:

 

The Ancestry.com Content Pipeline

 

As you can see, a main portion of the Content Pipeline is the Image Processing. This year we will use the IPP to scan and process around 200 million images, which requires some pretty cool technology that I hope to be able to describe in future blog posts.

 

A very high-level view of the  Image Processing Pipeline is shown here in the following diagram:

 

A very high-level view of the Image Processing Pipeline

 

Just to give you a taste of some of the images, or content, the Imaging Development team deals with, I’d like to quickly show you some examples of images.

The Good – Examples of Beautiful Images

Properly capturing, preserving and rendering images of historical records can be an extraordinarily difficult challenge. Beautiful historical images, like those shown below, don’t just happen naturally. In fact, with everything that can go wrong, it’s amazing they happen at all.

I have included here a few of my favorite images and invite and encourage you to share with us links to your favorites.

 

This image is from the following collection: Gretna Green, Scotland, Marriage Registers, 1794-1895.
(An Ancestry.com subscription is required to follow the links to the image and/or collection)

 

 

Italian Baptism (1776) and Burial (1598) Records

 

 

 

Italian Land Record (1682)


 

This image of a passport application came from the U.S. Passport Applications, 1795-1925 Collection. (An Ancestry.com subscription is required to follow the links to the image and/or collection)

 

In the next section I jump to the other extreme and show some bad and ugly images.

The Bad and the Ugly – Examples of Really Bad Images

Historical documents should be captured in-focus, properly oriented, in adequate lighting and digitized into an image at an appropriate resolution. However, and as you would expect, given the sheer number of possible problems in this processing sequence, as a paper-based historical record is converted to microfilm and then to a digital image, it’s not surprising that we encounter a large number of images that are barely legible, much less beautiful and engaging. To illustrate, I pulled together some snippets of images from a variety of collections with the kinds of problems we frequently encounter.

Collage of some really bad images we encounter in the Image Processing Pipeline

As illustrated in this collage, there are many, many ways for images to have problems that make them difficult to read or even use for family history research. They could be partially destroyed from a fire, copied lightly so text is barely readable, scanned off-center, photographed at an angle that skews the pages and many other unfortunate forms. Fortunately for everyone involved, these problem images are the exception, rather than the rule. But, unfortunately, the problems are not so rare that they can be ignored, nor treated individually as they are discovered.

In the next several blog posts in this series I will attempt to describe how we deal with problem images and how we use a variety of technologies to try to preserve and create beautiful images that can serve as the centerpiece for your historical narratives.

Additionally, in future blog posts in this series we will deal with the various components shown in the Image Processing Pipeline diagram, such as:

 

 

The post Image Processing at Ancestry.com – Part 1: The Good, The Bad, and the Ugly appeared first on Tech Roots.

]]>
http://blogs.ancestry.com/techroots/our-image-processing-pipeline-the-good-the-bad-and-the-ugly/feed/ 0