Posted by on June 4, 2013 in Big Data

Location, Location, Location – the importance of normalized place information in historical records

One of the many challenges facing family history researchers is the changeable nature of things that at first blush feel immutable. The old church in the town square seems to have existed from the beginning of time, the place we call Los Angeles has always been, the family farm will always be ours.

Los Angeles 1930

Los Angeles 1930

In fact, the world is a changeable place and given the lens of history, place names and boundaries are never static, but can change due to wars, administrative actions, or simply population growth.

Many that are new to family history don’t think about the fact that cities used to belong in different counties (or states!) and that a record about their ancestor in 1910 is a snapshot of the geopolitical situation at that moment, not a representation of things as they are today.

At Ancestry, we have been faced with this challenge when trying to tease meaning out of the words extracted (using OCR) from a historical record. It is challenging enough to be able to tell from the context that St. Petersburg is a word that means the place in Russia and is not a surname, given name, or occupation , but when St. Petersburg can also be known as Leningrad, things get really interesting.

St Petersburg Map

St. Petersburg, Russia

 

To help in this process it is important to have a reliable “place authority” to call on to help disambiguate what a term really means.

For many years we have relied on our internal place authority to assist in this task. Creating a global authority of all place names for all time is no small task. The aforementioned Los Angeles used to belong to Spain, then Mexico, then the United States. Its boundaries have changed much from the 1700s to today and the term “Los Angeles” has different meaning for different time periods.

We are in the process of upgrading this authority to better understand the relationships between a place and its parent jurisdiction, a modern place vs. a historical one, and language and spelling variants for the same place.

When completed, this new authority will be much better at disclosing that Providence, Bedford, Pennsylvania used to be a legitimate place name, even though today Providence is in Providence county, not Bedford.

It is our intent to make the searching for records about your ancestor more relevant and precise, as we better understand the historical context of places.

About Laryn Brown

Laryn is a Sr. Product Manager at Ancestry.com and joined the company in 1998 as the first product manager, then went on to launch Ancestry.co.uk as the first international website with the Ancestry brand. Currently he is the product manager for a small Research and Development team focused on natural language extraction from OCR and web crawled source material.Prior to working in R&D, Laryn managed the Document Preservation team. This team digitizes and indexes all of Ancestry’s historical records globally. He has also worked as the head of content partnership development, based in London.Working in genealogy as a profession and a hobby, Laryn is actively involved in the genealogy community. The threads of his own genealogy include Birmingham bricklayers, Canadian homesteaders, American colonists, and Norwegian farmers.


We really do appreciate your feedback, and ask that you please be respectful to other commenters and authors. Any abusive comments may be moderated.

Commenting is open until Tuesday, 18 June 2013