Pages

Monday, August 27, 2012

The Search Engine is the Core of the Program


The Internet is driven by search engines. Programs such as Google, Bing, and similar programs easily dominate all other programs. Despite the huge reach of Google and other search engines, the content of proprietary databases, with all of their purported source documents, are only as accessible to their own individual search engines. This is true of all of the online commercial genealogical databases. You cannot use Google huge searching capabilities to find an individual document on FamilySearch.org or Ancestry.com. The only exceptions are those documents that have been copied off onto other websites. The proprietary databases are generally closed to the Google webcrawlers.

So, to find something on one of the genealogical database sites, you must depend on their individual search engines. Needless to say, few of them are as easy to use and effective as Google. Most of them are also dependent on the accuracy of the various indexing programs that have been employed to open the digitized documents to searching. A digitized scan of a document is an image. The information on the image is essentially locked away until someone comes along and transcribes all or part of the information into a text format that can be searched by a computer. Of course, the image can be named or given metadata that will make it searchable, but whether the document in the image is indexed or the image itself is provided with sufficient metadata including a title, to be found depends on those providing the indexing.

Assuming the documents in the genealogical database are fully described through an indexing program, finding those documents still depends on the programming of the search function (i.e. search engine) created to search through the documents. So, in effect, the usefulness of the database rises and falls on the ability of its search engine. It does not matter how many millions or billions of records the website claims to have, if those documents or records are inaccessible due to a limited search engine capability.

Since the utility of a genealogical database depends so heavily on its search capabilities, you would think that much of the effort of the commercial databases would be concentrated into making their records as accessible as possible through development of effective search engines. At the threshold of effectiveness is finding a document that is known to be in the database through a relatively simple search. Now, who should I pick on? How about the four following databases:
 Just to remind you, FamilySearch.org is owned by FamilySearch, Ancestry.com is Ancestry.com, but Archives.com is now owned by Ancestry.com and WorldVitalRecords.com is owned by MyHeritage.com.

In all fairness, Archives.com was only very recently acquired by Ancestry.com and any results of an examination of their search engine certainly is not yet a reflection on Ancestry.com. MyHeritage.com has had a relatively short time to work on the WorldVitalRecords.com site but both FamilySearch.org and Ancestry.com have had a long time to develop their own search engines.

For a change, I will choose a more obscure relative for the comparison and analysis, Adeline Springthorpe Sparks Thomas (b. 1826 d. 1891). She was born in Colston, Leicester, England and died in Manti, Sanpete, Utah. Choosing any one person for a trial such as this is really not fair because some of the databases may not have any records at all about the target person. A real feel for the sufficiency of the search engine can only come from repeated searches over a period of time. But despite that acknowledged limitation, I am forging ahead basically to show a methodology for comparison.

Now, let's establish a baseline for comparison. Death records were not mandatory in Utah before 1898. But at the beginning of this post, I added a photo of Adeline Thomas' gravemarker. So we have some very limited information about her to start with. Also, I had no preconceived idea as to what was in the various databases about my ancestor.

Does she appear in any of the four online databases? I will start with a very simple search in each one.

FamilySearch.org: The first search using her entire name resulted in hundreds of thousands of returns. By entering a residence in Manti, Sanpete, Utah, the second search brought up her family living in Kanosh, Millard County, Utah. Here is a copy of the U.S. Census record:


Now let's see what happens with the others.

Ancestry.com: The only thing I added to the search was her gender. The only document produced was a reference to FindAGrave.com. However, this link gives a lot more possible information. Here is a screen shot of the FindAGrave.com entry for Adeline:






So both FamilySearch.org and Ancestry.com immediately led to further information. On to the other two.

Archives.com: What will I find in their claimed over two billion records? The first search produced over 3,000 records. Too many to look at. I added more information by adding Utah as a place. I immediately got the same 1870 U.S. Census record and many other suggested records. I added a birth date to cut down on the number of results and could have kept going for quite a while.

WorldVitalRecords.com: No results after changing the search terms by varying the place and dates and name.

You might like to make your own comparison. Some of these programs are available for free at the FamilySearch Centers.






No comments:

Post a Comment