Tuesday, July 10, 2012

Numbers don't always equal value

Some of the largest genealogical databases have recently published claims to really huge numbers of "individuals" or "records" or "collections" or whatever. What do these large numbers really mean for the researcher? As it turns out in some ways they are good, but in other ways, they are totally misleading, sort of like "whitest white" and "best mileage." If you would like to get an idea of the numbers game, read "Benchmark Numbers for July 2012"by Randy Seaver on Genea-musings. Here are some of the terms from Randy's list of website claims:
  • paying subscribers
  • databases in card catalog
  • family trees
  • public member trees
  • private member trees
  • record collections
  • wiki articles
  • research articles
  • community trees
  • original documents
  • images
  • memorial pages
  • collections
  • names
  • newspaper pages
  • individuals in family trees
 None of these terms used in the claims are defined and none of them are essentially equivalent to other similar terms. This is not Randy's problem, this is problem with the claims made by the various online databases. So where is the truth in genealogical advertising? Guess what? It doesn't exist. The numbers game is at best, as I said above, misleading but in a real sense it is entirely meaningless to a researcher. Now, I am not talking about finding aids like the FamilySearch Research Wiki and Cyndi's List. They are in a different category, but if you do not find your search item, then the size of the database is essentially immaterial.

Let me give a few examples. Let's say I am looking for a cemetery in Arizona. Will any of these mega-genealogy sites help me at all? No, not much, although I might find a link in the FamilySearch Research Wiki. Of course, I can check with another mega-site, Findagrave.com, with its 83 million records, but what if my particular cemetery isn't listed? What it turns out is that most of the large sites are repositories, not finding aids. If the site doesn't have what you want, then it doesn't tell you where to go to find it outside of the website itself. Extending this example, you search for the Cemetery records in FamilySearch's Historical Record Collections and do not find anything close to what you need. As I mentioned, FamilySearch has the Research Wiki, but nothing in the program tells you to try looking for additional record sources in the Wiki. You are left to discover that your self.

Next example. You are looking for a U.S. Civil War Pension Record. If you do a search on any of the major websites, such as Ancestry.com or Fold3.com (both owned by Ancestry.com) and you get negative results, that's it, you get negative results. No suggestions as to where to go from the negative results, especially no referral to the either another major repository, such as FamilySearch.org or even the National Archives.

Yet another example. Many of the websites claiming millions or even billions of records, are actually including user submitted family tree data as part of their numbers. None of the sites are discounting the number of duplicate entries or entries with no substantial information, such as just a name. In my own case, many of my ancestors have hundreds, perhaps thousands, of repetitive copies in the larger databases. The fact that there are hundreds or more copies of the same individuals in the large databases, certainly diminishes their value. If I know the correct information about an ancestor, the copies are usually useless to me. If I don't know the information about the ancestor, how am I going to choose between all the competing claims? I can claim I have 1000 people in my personal database, but what if 800 of those are the same person?

All of the major databases have their uses and their values, but attempting to equate numbers of undefined categories with value is misleading.


  1. Hi James,
    I've nominated your blog for the Illuminated Blogger Award: (http://foodstoriesblog.com/illuminating-blogger-award/) Cheers!