Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, May 1, 2014

Search Functions: Challenges in Examining the Heart of the Online Genealogy Database Programs

From time to time I return to the issue of searching in the various online database programs. It seems to me that almost the whole purpose of having a huge, millions-of-records database is to make all those records accessible to the genealogists who are seeking information about specific ancestors. I am also assuming that there have been programming changes to these various programs' search functions over the past few months. In addition, I am adding in two more programs; Mocavo.com and findmypast.com.

Right at the beginning, I should mention that my opinion about the various programs has evolved over the past few years and more recently, has evolved into a completely different opinion than I have expressed previously. As Emerson said, "Consistency is the hobgoblin of small minds" and I don't claim any particular duty to be consistent over time especially in my opinions.

Recently, this search process has evolved into two separate functions: self-directed or user-directed searching and automatic searching. Although the large databases with automatic search capabilities, such as Ancestry.com and MyHeritage.com, both provide a multitude of sources for an uploaded or entered family tree, there is no real way to compare the two systems. This is also true of the semi-automatic system so far implemented by FamilySearch.org's Family Tree program. With all three of these programs it is necessary to upload or enter a family tree before the programs can begin their automated or semi-automated processes. In each case, the number of suggested sources is likely more of a reflection of a match between your ancestors and the sources available in the database than the effectiveness of the system. So, I cannot rank the programs in any way based simply on the number of results.

It is supposed that all of these programs use some variation of matching individuals in a family tree with the pattern of similar family trees online and then use data from both the parents etc. and children etc. of the individual to "match" the individual to a record. Since the programs are using more information than is usually supplied by a user, they are much more thorough and usually much more accurate than a user entered search. This raises and interesting question; if the automated search does not find any sources for a specific individual in a user submitted tree, then what does that mean? If you have a user submitted family tree, is there any reason to do your own searches, especially for individuals for which there are no sources found by the automated searches?

It is when the programs are used in a more "traditional" way as a place to search manually for individuals that the real differences between the programs appear. All of the larger online database programs, as listed here below, have a way to manually enter information into the search function and then have that search function return results. Because the user selects which information is entered in a manual search, this would seem to be a way to judge the responses from the various programs. The problem is that, once again as with the automated searches, the results may reflect the contents of the database more than the ability of the search functions. For example, suppose I am search for birth records in Vermont. Does the program even have any of those records? This is a real question that must be addressed before any possible comparison can be made. Here is the list in no particular order:

I suppose the question could be asked as to why I have not included some of the other large database programs. From one standpoint, many of the other online databases are owned or controlled by these entities. For example, Ancestry.com owns Fold3.com and Genline.com and other websites. In addition, these particular programs have vast resources that put them in a distinct category from almost all other programs. 


Given these limitations, any comments about the various programs become purely subjective. Let's suppose I select an individual known to be in the 1920 U.S. Census records and do a search in each of the programs. This would not be a fair comparison for a program that did not have a copy of the 1920 U.S. Census and likewise, assuming the programs all had that particular database (which they do not) how do we compare a search if all of them find the ancestor in that particular database?

Let me take another hypothetical situation. Let's suppose that I choose one specific individual to use as the "control." I do a search on this individual using the same search terms in each of the different programs. Let's further assume that I get different results from each of the programs (likely). What criteria can I use to determine if one search is "better" or more productive than another? The number of documents returned? What would be the point in that? The number of documents that actually refer to the control individual? What is the documents returned are only the most common and expected documents? What if one or more of the databases lack the common documents and give me very surprising or innovative sources that measurably assist my research?

I often hear comments that such-and-such a database program has a "good" search engine or that another one of the programs has a "poor" or useless search engine. In fact, such comments may be made based solely on the personal expectations of the user. For example, suppose I am searching for a information about a specific ancestor that I would expect would be "in the database." If my search efforts do not produce such an expected record, I may attribute this lack of results to the fact the search engine "does not work." When the reason may lie with the way the information was entered in the original source (misspelled or badly written) and may not be a failure of the search engine at all. This failure could have any number of additional contributing factors including poor indexing and transcription not just a poorly designed search engine.

Another perceived problem is when a search produces extraneous results. For example, I search for John Doe in Oregon and get John Roe in Kentucky. In most cases, rather than being a defect, this type of result is exactly what the program is designed to do given the limited amount of information supplied to it by the user. When the program does not find a record with the parameters entered by the user, the program is designed to be more inclusive. This fact has resulted in some of the programs providing a way for the user to specify the degree of specificity used by the search engine, from very inclusive to very exclusive.

I think we have to come to the conclusion that making valid comparisons between an given results between the various programs is difficult and very likely to be misleading. Of course my observations will not dissuade anyone from making such valuations but I would hope that as we do make such observations, we at least acknowledge that all of the search engines in each of the above programs work very well if we know what to expect and what information is really in the database.

2 comments:

  1. Well no, what Emerson said was "A foolish consistency is the hobgoblin of little minds...." That little word "foolish" makes a great difference in the meaning!

    ReplyDelete
    Replies
    1. Isn't being consistent in the face of changing facts and/or circumstances always foolish?

      Delete