Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, May 15, 2013

Searching for Genealogy - What works and what doesn't

Genealogists who are actively doing research, spend considerable time searching for information. Obviously, as technology and the Internet have become pervasive, a lot of that time is spent looking for information on or in websites. To do this, we collectively rely on a variety of search engines (programs that search either the Web at large or a specific database). The two extremes of our searches involve either no responses or at the opposite end of the spectrum of results, millions of responses including mostly "false positives." 

In the past, I have compared both the general online search engines, such as Google, Bing and others, as well as individual database search engines. In doing this, I have used the name of my Great-grandfather as a search term. This was done for a variety of reasons, primarily because I already know approximately how much information is contained on the Web about him and because his name was just distinctive enough to be a good indication of the effectiveness of the search engine's ability to filter out false positives. 

I recent did a search for another ancestor in a major online database and got some interesting results. My search included his name and other information including his date and place of birth and date and place of death. However, the results came back with some limited information on the ancestor but many more false positives, all of which were for people who did not match either the birth date (i.e. lived before) or the death date (i.e. lived after). For example, the ancestor died in the 1890s, but I consistently got suggested matches for U.S. Census records after 1900. This started me thinking as to why the programmers added those types of fields into their search terms if the program was not sophisticated enough to make those types of distinctions. 

What is more frustrating than false positives is a complete lack of response, where the search engine returns no matches. This happens frequently, even when I search for a very common name or even a common place. Usually, the problem lies with something ridiculous such as adding or not adding an initial capital letter or something similar. It is more common when I am searching for a particular string of words or letters. For example, if I enclose the name in quotation marks to specify the entire name. Of course, it is entirely possible that the person is listed in the database but not with that specific name. 

So, in order to judge the effectiveness of any particular search engine, I decided that looking for names was a good way to judge whether or not a the program worked and how it worked. I also occurred to me that I should use a specific document that I know exists as a test. I decided to combine the two and add a book about my Great-grandfather, Henry Martin Tanner, as an additional test of the search engines capability. Of course, this would not help if the database did not have any book titles or names, but the same principle could apply, I would just have to look for a general terms and then get more specific, adapting the searches to the type of information I expected from the database.

It has been some time since I compared the general search engines, so I thought I would get right down to business with a quick review of where they stand today and then go on to some more specific types of database searches. The book I chose is as follows:

Tanner, George S. Henry Martin Tanner; Joseph City, Arizona Pioneer, Born June 11, 1852, San Bernardino, California, Died March 21, 1935, Gilbert, Arizona. 1964.

This book had a very limited printing. The title of the book contains some very generally available search terms, such as Arizona, California and pioneer and I know if can find the book immediately in WorldCat.org because it is in quite a large number of libraries around the United States. Additionally, I know the book has been digitized by FamilySearch and a digital copy is available through the FamilySearch Catalog on FamilySearch.org. Finally, Henry Tanner has tens of thousands of living descendants and many of them are involved some way or another in genealogy. So, if I do not find anything, the problem does not lie with a lack of availability. 

Here are the results of my searches with some general online search engines. I used just the first part of the name of the book as the search term, "Henry Martin Tanner; Joseph City, Arizona Pioneer."

The first search in Google showed more than 50 exact results to references to the book before the entries became so attenuated to be referring to other topics. However, a search in Google Books for the book, as I have noted before, indicated that no digitized copy was available. This is likely due to the fact that the FamilySearch (Family History) Library Catalog is not "online" as yet. 

Microsoft is still striking out. Bing.com found only 9 references to the book and had false positives and totally unrelated results within the first three entries. It sent me to Facebook and to other unrelated individuals immediately. 

This genealogically oriented search engine and website has come a long way. I have been very impressed with their progress. But in finding this book, it had 10 results, but none of them were to the location of the book, merely to websites where the book had been quoted. 

Since Yahoo.com uses the Google search engine, you would expect similar results but disappointedly, Yahoo.com struck out with results similar to Bing.com; only 9 direct results even going on to over four pages. 

I had a hard time with this one. Ask.com gave totally unresponsive results immediately. One interesting one, the fourth response was to the Biography of Doc Holliday. Hmm. I wonder if there is a connection? It did have possibly 10 references to the book but they were surrounded by totally false positives that did not even have a majority of the search words. 

Aol.com actually didn't do too bad. It made about 14 hits before it got so vague as to be useless. But those were still mixed in with the first 4 pages of search results. It has the same problem as Ask.com, mixing in totally unrelated items that do not match any of the search words. 

I have searched in this one in the past, so I decided to throw it into the comparison. Except for trying to sell me hotel rates in Joseph City, Arizona, it did have only 8 results in the first 4 pages, giving it a very low score in this comparison. 

I guess I gave up after Dogpile.com, there didn't seem to be any point in prolonging the comparison. 

Even if you are anti-Google, for whatever reason, you can't argue with results. The point here is obvious. If you are looking for genealogical information and don't want hotel reservations or Facebook, you might want to stick with Google. But realize that even Google couldn't find the online, digitized copy of the book. More about this later. The clue is in this post: I found the book in WorldCat.org

No comments:

Post a Comment