Ancestry® has billions of records. What does it take to find—and then add—so many images to their collection?
Where Does Ancestry Find Record Collections?
Specialists referred to by some as “the Indiana Joneses of Ancestry” scour the world for records content.
This records content includes:
- census records
- immigration records
- birth, marriage, and death records
- military records
- religious records
- historical photos
How Is Content Made Available on Ancestry?
After searching the world for content, Ancestry specialists have to negotiate contracts to host the records on Ancestry. This can take a while: the longest contract negotiation so far was 12 years.
But that’s just the beginning. After the contract is completed, the content then has to be uploaded to the Ancestry network.
If it’s bound books, for example, a special scanner called a planetary scanner is used to digitize the images.
Ancestry specialists also capture digital images from 16mm or 35mm film, microfiche, and many types of paper documents.
The scanning process itself can be tricky. Some of the records are quite fragile, as they’re very old. The oldest record on Ancestry, for example, is from the 1300s.
And these records can be challenging to scan if they’ve been kept in poor storage locations. The parish records from Gretna Green, Scotland, for instance, were so crumpled they required conservationists to work for 4 months to flatten them out before they could be imaged.
Like many of the records preserved on Ancestry, the history of these records is fascinating: Gretna Green was a sort of Las Vegas destination for the English in Victorian times. The local priest—and blacksmith—Simon Lang performed thousands of marriages, sometimes while inebriated.
Ensuring High Quality Images
Once the images are digitized, specialists edit and quality control the images. Each roll of film contains about 600 images on average—and up to 5,000 images.
Specialists look at 100% of the images in a thumbnail view—a minimum of two times—to see if there are any defects.
Examples of defects that specialists are on the look-out for include documents that are:
- copied lightly so text is barely readable
- scanned off-center
- photographed at an angle that skews the pages
Ancestry uses a lot of state-of-the-art technology to make sure images on the site are as high-quality as possible. There are many types of these technologies, but here a couple of examples.
For documents with barely legible print, Ancestry designed their very own “DARC” camera that allows them to image very badly damaged documents using ultraviolet and infrared light.
These different wavelengths of light allow Ancestry to photograph on alternate spectrums of light, capturing handwriting that is otherwise invisible to the naked eye.
Another high tech technique that allows Ancestry to greatly improve the quality of the documents added to the site is called adaptive auto contrast. Here’s an example:
The original scanned image on the left is dark and has low contrast. See how hard it is to read? You’ll notice the leveled image on the right is brighter. But it still doesn’t have enough contrast for you to read it easily.
One might think the solution is just to increase the contrast. But as you tell from the image on the left below, this leads to results that are not uniform—because the image itself has darker and brighter sections.
Adaptive auto contrast accounts for both light and dark page sections . It makes all of the copy more uniformly bright (known as better levelling) and ensures a better quality image overall.
As you can see in the image on the right above, adaptive auto contrast is the key to making an otherwise hard-to-read document much more legible.
Making the Content Accessible and Searchable
Scanning the images is a huge undertaking. But it’s followed by an often even more time-intensive process: creating indexes. Indexing is taking the information on the images and making it searchable on Ancestry.
This requires standardizing key aspects of records—from standardizing the way dates are written to dealing with several different spellings for a given location (like “Cal,” “Ca,” “Calif,” “Calfella,” all for “California”).
In a given month the Assembly team at Ancestry processes as many as 80 million data fields. Depending on the project, approximately 30% of the data requires some human intervention.
That means up to 24 million individual data fields may need to be verified manually. And that’s just the Ancestry team. Through the Ancestry World Archives Project, Ancestry users are also able to help with indexing.
Billions of Records at Your Fingertips
An average of two million records per day are added to the Ancestry database by Ancestry specialists.
And thanks to their scanning the world for records, you no longer have to travel to different locations and squint at hard-to-read documents to find out more about your family story.
Of course you can still travel, but you can first do your research from the comfort of your own home.