Thank you all for your great feedback and comments on my last post. I want to address the topics you’ve raised one by one over the next few weeks. I have to start somewhere, so today I’ll address the desire for a more powerful way to search that reduces irrelevant results. Here are a few of the comments I’ve received from you in this regard:
- Dale writes: “I am searching for a rare surname, ENEVER. It is frequently misspelt… When I do an exact search I only get a fraction of the responses I know are out there. When I do a non-exact search I get over 10,000 results…”
- Terri writes: “… it is frustrating to have so many hits come back when I’m looking for something pretty specific.”
- Lauri writes: “… When I say that I’m looking for someone who died in 1852 and the first person on the list is born in 1945 and the rest of the search result looks like it is in random order, I call that a VERY poor search function.”
- Diane writes: “Am echoing earlier comments but only so you know how many of us want the same basic search features: Ability to restrict a search to a specific locale yet without the exact name. Example: I know the person was in Spokane in 1920 but don’t find with exact name match (name is typically spelled incorrectly). If I do a non-exact name search, I get all the US and it’s painful to narrow to Spokane. “
- Melody writes: “…when I do a general search for people who lived and died in NC in a certain time period, I get records for people who lived everywhere, and in any year. Sometimes I’m forced to do an exact search because it’s the only way I can find something (if it’s spelled correctly). There are too many records when I do a general search.”
The Answer – Ancestry’s Advanced Search
As a budding genealogist, I too find that sometimes our search engine can return too many matches that seem to be irrelevant to the information I typed in. In fact, I heard that comment frequently enough from customers like you, that we completed a project some time ago that I believe addresses a lot of these issues – Ancestry’s Advanced Search.
Here’s what it looks like:
Advanced Search allows you to mark, on a field-by-field basis, what items you’d like to require to match exactly and what ones you don’t mind getting a “fuzzy” match on. This lets you, for example, do a fuzzy search for the name (to find alternate spellings) while forcing the birth year to exactly match a specific year or year range or forcing a birth location to exactly match a state or county.
That sounds good, but why would I want to find a “fuzzy” match?
This is a big question – we’ll start with names, then take on dates and places. As you may already know, names are often misspelled or mis-transcribed on historical documents. Even though you know great-grandfather’s given name was Ebenezer, there are countless ways Ebenezer may have been spelled and/or abbreviated on old records. Surnames can be even trickier because they can often change over time. For example, my last name “Hulet” was spelled “Howlett” as little as 200 years ago. Here are a few of the reasons names can be tricky:
- Spelling of the name “evolved”, often when a family member immigrated (In some cases, names are even translated directly from one language to another – example: “Zimmerman” in German could be “Carpenter” in English)
- A nickname is used (Did you know that “Polly” is a common nickname for “Mary” or that “Peggy” is a nickname for “Margaret”? These aren’t very intuitive because they sound very different from one another.)
- A name is abbreviated (“Charles” could easily have been written as “Chas.”, “Ch” or “C” on an old document)
- An ancestor may have spoken in a heavy accent or may have been illiterate so that a census-taker or immigration officer had to spell the name out phonetically
- The person who copied down the information from the original ledger (often the originals were discarded in favor of copies written by a scribe with nice penmanship) may have not been able to read the original or may have copied it down incorrectly
- Similarly, the person who typed the information into the electronic index may have not been able to read the original or may have typed it incorrectly
So what is one to do?? Luckily, this is where Ancestry’s search engine can be a real help. We use several techniques to find “fuzzy” matches for names. First, we have a “name authority” of alternate spellings for thousands of names that our search engine automatically looks for when you do a search. For fun, I poked around at our name authority for the name Timothy Sullivan (our CEO) and here’s what I found:
- “Timothy” had over 50 possible spellings in our list, including “Tim”, “Timothee”, “Timmy”, “Timmothy”, “Temothie” and “Timothe” just to name a few
- “Sullivan” had over 30 possible spellings in our list, including “Sulaven”, “Sullavin”, “Sullevan”, “Sulavin” and “Sullyvan”
In addition to the name authority, we also look for abbreviations of given names. We also use a phonetic algorithm called “Soundex” to do even more “fuzzy” matching on surnames (and we’re considering adding it to given names, too).
Could you imagine trying to think of all of the variations of the given name and surname, let alone trying to search each one? That’s an awful lot of searching ~ it would take over 2,000 manual searches, in fact, to search each unique combination of name variations. And Ancestry’s search engine will do all of those combinations for you in the blink of an eye on every search that you do that doesn’t mark the given name(s) and surname(s) as exact.
So that covers names. Dates and places are a bit more simple. Dates can be wrong for a number of reasons: many people lied about their ages (in one case that my colleague Lou Szucs identified, a woman in the US Federal Census aged only 12 years over a 30-year period!), sometimes the forms required people to round to the nearest five or ten-year increment, etc. Places can be wrong for a number of reasons as well: an unknown move or temporary relocation, an ancestor may have lived with another family or as a domestic servant or apprentice somewhere, etc.
Our fuzzy matching on dates basically looks for the years closest to the date you entered and scores them higher than records with years that are further away (all other things being equal). We don’t do a lot of fuzzy matching on places yet, but that is something we’re excited about pursuing in the future to make the search engine better.
To sum it up
Advanced Search gives you the power to easily specify which elements of your search you’d like to be “exact” (meaning that they must be both included on the resulting record AND match exactly as you’ve specified) and what you’d like to be “fuzzy”. This gives you power and flexibility to get exactly the type of matches you’re looking for–fuzzy enough to find the right records, but exact enough not to have to wade through so many matches that are less relevant.
Give it a try and let me know what you think.
About Kendall Hulet
Kendall Hulet has served as our Senior Vice President of Product Management at Ancestry since March 2015. He joined the Company in 2003 has held a variety of roles in the product organization including Director of International Product Management and most recently Vice President of Product Management for AncestryDNA. During his tenure, he was deeply involved in some of the most popular innovations at Ancestry, including the “Shaky Leaf” hinting system that has delivered over five billion discoveries; the Ancestry Family Tree system that has led to the creation of over 70 million family trees containing six billion ancestors; and the creation of the award winning Ancestry mobile app, which has been downloaded more than 12 million times.