RootsTech 2014

Some people eat, sleep and chew gum, I do genealogy and write...

Tuesday, May 27, 2014

Understanding Numbers in Genealogy

The large online genealogy companies are always throwing out huge numbers of new content or users or whatever. Some of the terms common used include the following:

  • names
  • records
  • images
  • collections
  • members
  • users

Rather than being a simple case of using the word as it is commonly understood, this short list of words have become slippery when used to describe the content of these websites. For example, let's suppose that one of these websites digitizes and make available a copy of the 1880 U.S. Census. How is this counted? Is this one collection? Is it over a million images? Is it one record or a million plus records? If there are approximately 50 names per census sheet, can the website claim to have 50,000,000+ names in its database to search? (Assuming, of course, that the entire census has been indexed). The question is, what number goes into the overall claims made by the website to have millions of names or records.

As I have said before, the size of the database is meaningless if it does not have the record you are looking for. Periodically, these online databases announce the addition of millions of more records. Recently, the numbers have gone into the billions. There comes a time when ever larger numbers become meaningless. No one can comprehend a billion names, much less multiples of a billion.

Perhaps it is time for the large online database companies (without naming any names) to try to impress us in some other way. How about an emphasis on the accuracy of their searches or the ease of use of their programs. Both would be a somewhat refreshing change. I certainly realize that some of the companies do talk about subjects other than the size of the their database contents, but almost all large online companies enjoy telling us all how big they are and how big they will be tomorrow.

I can remember when McDonald's kept a running count of the number of hamburgers served. Of course, as time went on, the numbers got astronomical. Finally, MacDonald's moved on to a more productive line of advertising.

I also realize that the online large genealogy companies do use alternative methods of advertising, but nevertheless I still get regular blog post announcing another million+ records being added to this or that database. Some of these announcements are just trying to tell all of us about the newly added collections, but even that gets to be old news after a while.

If you think about it for even a few seconds, you will realize that claiming to have added so-many names is meaningless. There could be more than a hundred names on any one page of some types of records. Likewise, a collection could contain 1 name on 1 record or a hundred million names on millions of records. The size is no guarantee of the relevance of the content.

Let's consider a book for example. Is a surname book one record or collection? If so, do you add the total number of names mentioned in that collection to your statistics pertaining to the book? The term "record" is even more ambiguous. A record could contain millions of names or just one name. Likewise, a collection could contain millions of names or only one name. Isn't somewhat misleading to use the same term (i.e. collections) to refer to content involving millions of entries and therefor potentially many useful contacts with one that is merely a single individual? By the way, it is common for the larger companies to refer to a document with a single family name as a "collection."

Presently, because the different companies use these simple terms in different ways, it is nearly impossible to compare the contents of the different database programs. How do you know if the records you are searching for are even in some of the online programs?


No comments:

Post a Comment