Thursday, August 9, 2012

Comparing the genealogy mega-sites -- Is it possible?

The best online genealogy site is the one that has the document you need. That seems pretty elementary. But I was thinking that there should be a semi-objective way of comparing the large sites, assuming that a comparison is even possible. Raw numbers are not much help and can be very misleading. For example, the claim to so many "collections" when a collection can have millions of records with 10s of millions of names or less than 100 records with a few dozen names. So if the terms records, collections, documents or whatever are impossible to compare, how do you compare the huge online genealogical databases?

The more I thought about the issue, the more difficult the task of comparing the different sites objectively became. Take U.S. Census records for example, both and as well as several other sites, have complete indexes and/or images of the entire U.S. Census. ( would have an entire set of images, if it were not for a an early agreement with that would host the 1880 U.S. Census images and would have an index). So comparing sites based on Census content is somewhat meaningless, since there are multiple copies of the entire U.S. Census online both for free and for subscription. Shouldn't the websites be judged, in part, on the number of unique records they have online rather than the total number? But there is no efficient way to match duplicate records in different databases. And what about duplicates within in each database? How do we discount for duplicate records? Should the index of a record and the images of that same record be counted as two separate items or one?

In addition, isn't it fair to entirely discount user contributed family tree type records as being merely copies of other record sources? I realize the value of other's research, but when I am looking for source records, should the fact that my family tree is on the website count towards the total record number for that website? What about indexes or extracted records as opposed to images of original documents? Should the large websites get to count the number of indexes they have compiled when the original records are not provided?

How do you compare subscription websites with free sites? Shouldn't the free sites get more credit merely because they are free? Or should subscription sites be given the advantage because they can obtain records to which free sites do not have access?

Which is more valuable, a website with information about your particular family with every fact cited to a source record or a huge online database with billions of records. This goes back to my opening statement.

Another factor in comparing databases is the effectiveness of their search engines. What good is site with a huge number of records, if there is no way to find what you are looking for? In fact, every method I came up with for comparing the large websites, turned out to be essentially a comparison of their search engines. So all of the hype online about the size of these larger sites is really not helpful if they don't contain your family's records.


  1. It is not only the search engine or index. It is also how can you find the image. One example findmypast the only way to an image is to thru their search engine you can not move to the next or previous image.

  2. Part of the comparison problem is what exactly is called a "record." calls its extracts from documents, books, whatever, "records", and apparently each tree person-entry as well (including the horrid OneWorldTree and entries in the miscellaneous compilations from IGI, family group sheets, etc.).