Posted by on June 30, 2009 in Content

The size of Ancestry.com’s record collection is a fascinating topic. As of June 2009, subscribers to Ancestry.com and our international sites can search the historical censuses for the U.S., UK and Canada, U.S. and international vital records, amazing collections of military and immigration records, and many others, not to mention the 10 million family trees added to our site by members in the last three years with over one billion profiles (names) and 20 million user-submitted photos and stories. This much is certain: Ancestry.com is far and away the largest collection of family history records online.

 

Defining and counting records on Ancestry.com

 

The concept of ‘counting’ records sounds relatively simple until you get deep into the details. How is a record defined? Is it a mention of a person? A household? A page? If a birth record has the person, parents, doctor and witness, how many records is that? And what of records where we don’t know for certain how many people are referenced, such as newspapers or city directories?

 

For our ‘fielded’ or indexed collections – structured data such as censuses and passenger lists – a record count is defined as the information about each specific person included. For example, one WWI Draft Registration Card is counted as one record. Similarly, each line on a census page is also counted as one record as typically it to will contain information about a specific individual.

 

For our ‘unfielded‘ collections such as newspapers and family histories, there is no underlying structure to define a field and so until now pages have been sampled then an average applied to determine an estimated name count. For example, our 42.5 million (countable) newspaper pages were multiplied by 60 names per page to achieve an estimated total name count.

 

Traditionally, we have counted our total number of records by combining the number of records for each person contained in our fielded collections and the estimated number of names in our unfielded collections.

 

However, as our company and collections have grown so significantly in recent years, we have decided to apply a new and highly conservative counting methodology that better reflects our differing data structures. Going forward, all unfielded pages will now be counted as one record – no name estimates will be included in our total record counts.

 

So what does this mean?

 

Based on this new methodology we have over 4 billion records. Previously, we referred to an estimated 8 billion names listed with in our record collections. This is a change to our counting methodology only – no records have been removed. Ancestry.com members will continue to have access to all the great records they had previously.

 

With new records launching every week, these numbers are always increasing. More importantly, no matter how we count them, our goal is to continue to bring millions of valuable records to our members like we’ve been doing for more than a decade.

21 Comments

Andy Hatchett 

*sigh*

A line on a census is *NOT* a record-PERIOD.

The census page itself is the record.

Simply put- each *page* of each *document* is a record but the contents of any particular page is *not* a record- merely details.

June 30, 2009 at 10:00 am
Tony Cousins 

Can anyone remember the famous quote from Mark Twain about statistics? :)

TonyC

June 30, 2009 at 10:09 am
Jade 

**sigh**

Thanks to Andy and Tony.

The item most commonly called ‘the record’ on the site is the error-riddled partial extract by some unknown person from military records, Census enumerations and other material.

These are **not** **records**. Please stop calling them this. You can call them extracts, notes, excerpts. Often they do not rise even to this level because in the items from Census material you often include in a household people who are not even listed on the same page. Or add relationships where none are given in the original. Or give inexplicably incorrect birth-states.

In extracts from vital records, such as marriage records, your notes often do not even give the names of both parties. How could this possibly be viewed as “the record”?

Does anyone there have a twinge of conscience?

June 30, 2009 at 10:25 am
Jesse Taylor 

I am curious what database product is used… Oracle? MSSql? And how many database servers are required for all this. Impressive!

June 30, 2009 at 11:15 am
Jesse Taylor 

From a technical point, each line of a census record, if maintained separately in the database, is indeed a “record”. Each “row” or “tuple” is a record, consisting of a set of fields, each of whichg contains an item of information. A set of records is a file. So, if a census record is transcribed then each “part” such as first name, last name, address, etc would be a field and each line would be a record. The image of the census page itself would be an additional record.

June 30, 2009 at 11:42 am
Roger Moffat 

Lies, Damned Lies and Statistics….

http://en.wikipedia.org/wiki/Lies,_damned_lies,_and_statistics

And I agree with Jesse – from the database technicality standpoint, each line of a census is a record with a number of fields in it. If you perform a search, the result is a number of records which corresponds to the number of lines the search term was found on.

June 30, 2009 at 12:22 pm
Deb H 

The number of records is of no great import to me. However, the accuracy/quality of the data and the ability to find it is of paramount importance. Please spend less time counting and more time fixing what’s not working! Fuzzy search…PA draft cards…etc, etc.

June 30, 2009 at 2:50 pm
momvera1 

One of the things about record keeping is
age is an advantage because older subscribers
had to learn in elementary school to be good record keepers to get good grades and good jobs.
We had no Automated educational tools except the school bus! (-:
We also are the last generations to have known and lived with people who were born and raised in the 1800s without any availabilty of electricity to anyone anywhere. Your skills with your hands,
feet and eyes meant survival of the fittest.
It is very important to retain these records in their
original form while it can still be done. So don’t please get caught in “splitting hairs” clogging the wheels of progress. Think of your records of being here preserved. Will it matter? Yes it will.
You are already a record on the census. Make your being a happy memory. Fix problems.

June 30, 2009 at 8:31 pm
Ida 

I just wish they would add more of the promised records like birth, death, marriage, church records, etc….

July 1, 2009 at 7:47 am
john spigno 

i was wondering when are you going to get any records from naples italy
or the surounding towns?
is there anyway to find out? thank you for your time

July 1, 2009 at 9:02 am
Tai Slim 

The ancestry.com tips here are really helpful. Thanks

July 1, 2009 at 10:08 am
Peggy Bibby 

i think anything that is stored in files is a record. I just can’t enough of they to help me.

July 1, 2009 at 2:02 pm
Diane 

For the umpteenth time, when an existing database is “updated”, customers deserve to know what was updated. We realize the SSDI and Obituaries are updated regularly for recent updates.

However — in the last week, the following databases have been “updated”:
-Eng/Wales FreeBMD Marriage Index
-Eng/Wales FreeBMD Death Index
-US Yearbooks
-1810 US Census

Every other genealogy site that adds new records also itemizes the additions or explains the revisions. Why can’t Ancestry.com?

How is updating an existing source helpful when customers have no idea what was added? Do you really expect us to search previously searched resources for each ancestor “in case” you added something relevant? And in most cases, you probably didn’t add new records; more likely, the “updates” were indexing corrections, etc. Yet you tout “updates” as if you are adding new records.

In the last year or two, I feel as if every search I do is sending a probe into outer space with little hope of return. You have the details of what you update — why can’t these be shared?

July 2, 2009 at 9:41 am
Jesse Taylor 

Diane,
I am relatively new here but I am not sure what you mean. On the Ancestry home page, bottom left, I see a panel that says:

New records on Ancestry.com
Annual report of the Adjutant General of the Commonwealth of Massachusetts, 1863-1865
French Deaths by Guillotine, 1792-1796 (in French)
England & Wales, Marriage Index: 1916-2005
England & Wales, FreeBMD Marriage Index: 1837-1915
England & Wales, FreeBMD Death Index: 1837-1915
Report of the Adjutant & Inspector General of the State of Vermont, 1863-1866
1810 United States Federal Census
U.S. School Yearbooks
View all new records

Isn’t that what you are asking for?

July 2, 2009 at 11:06 am
Diane 

Thanks Jesse but I am afraid not. Many of those databases have been there for years. But in the last week, those I listed were “updated”. That means that Ancestry.com did something to the existing old databases. Maybe there were 2,000 yearbooks previously — now maybe there are 2500 or maybe 2,001 or maybe no new ones and Ancestry just changed something. We have no way of knowing.

For those of us who have been long time users, we searched those databases many times. If Ancestry would state: on x/x/x we added 10 new year books for the following cities (or states even), we would know that we should or shouldn’t re-visit the database for all ancestors in that state. The ultimate would be that they tell us: Added Iowa/Des Moines/Lincoln HS 1905, 1908. Same with the England/Wales BMDs — were new years added? new counties???

That’s what we are looking for –specifity when existing databases are marked “updated”.

July 2, 2009 at 11:34 am
BobNY 

No, Jesse that is not what she is asking for!!!

When you say that you are “relatively new here,” are you an ancestry employee new to the company or a subscriber new to the blog?
=======================
Yes, it does say that “1810 United States Federal Census” is among new records at ancestry.com; however, this data set has been there for a long time. Was it not complete and new records were added? Were the transcriptions changed? Did the search algorithm change? We don’t know, and as Diane said, without that information, do we have to re-search databases we have been through before.

The one that really gets me was the “update” of the 1900 census. In doing this update, ancestry has managed to lose all of the ED descriptions for major cities. Updated for searchers — now totally useless for browsers. Just one example of how ancestry “improves” the utility of their site.

July 2, 2009 at 11:40 am
Jesse Taylor 

I meant that I am a relatively new subscriber to Ancestry.com. I uploaded a tree from a Gedcom quite some time ago and promptly got too busy to do more work on my ancestry. Recently I got back into it and now am a paid subscriber.

July 2, 2009 at 12:01 pm
Shirley Herring 

I can not find the tab to invite people to my tree since the new update

July 8, 2009 at 5:11 pm
Jeff Jahn 

The problem I see is that you so interested in adding field to info that mistakes are made and not way to get corrections done. A example is the 1925 Iowa Census for Sherman Township, Sioux County. The census images are all messed up some are missing, some are duplicated so index sends you to wrong pages and such. Very disappointing when you cant find info.

July 9, 2009 at 9:39 pm
Ancestry.com Blog - » How many billions of records are on Ancestry … 

[...] View original post here: Ancestry.com Blog – » How many billions of records are on Ancestry … [...]

July 11, 2009 at 3:31 pm
lisa 

I have just recently started to look for my family. I am having a lot of fun, but there is something that bothers me greatly. I wanted to join and be a member at ancestry, and to my amazement, I have to pay a whole year at one time. I feel this is more than most people can do and it upsets me. I am tired of the only people that are getting information, here, and everywhere, are the people that have money. That is unfair and gives this web site a black mark as far as I’m concerned. I would pay from month to month, but the whole shebang it way too much for me at one time. Tragic too, as the information that I have in my tree is correct, but unsourced, and it could help others but it’s not very tempting to most as it is stated unsourced, yet it is correct.
That is my opinion, for what it worth.

July 13, 2009 at 9:14 am