Pages

Saturday, November 9, 2019

How Good are Genealogical Search Engines?

Every time you look for information online whether from Google or from a dedicated genealogical database program, you are using a search engine. The formal definition of a search engine is a program that searches for and identifies items in a database that correspond to keywords or characters specified by the user, used especially for finding particular sites on the World Wide Web. See Google.com. The application of the term, "search engine" has expanded over the years to include programs that search within specific databases that are open to searches on the World Wide Web. Very few people today make a distinction between the use of the general term "internet" and that part of the entire internet defined as the World Wide Web. In fact, use of the term "World Wide Web" has fallen into disuse outside of the acronym "www" used in a Uniform Resource Locator or URL, i.e. and address of a page (website) on the World Wide Web.

As a genealogical researcher, in order to do your research, you have two options: use a database program with a searchable index to the content of the digital images of historical documents or search the original documents word by word yourself either from paper or digital copies. If the database (website) you are searching supports user searches, your search is made by using a dedicated search engine.

The indexes that are the basis of a search engine's capabilities are either compiled manually by employees, sub-contractors, or volunteers or by programs that compile indexes from documents that have been subjected to optical character recognition. Presently, there are no computer programs that can efficiently extract information from historical hand-written documents although progress is rapidly being made towards this goal.

Obtaining results from using an online search engine is dependent on both the accuracy of the search engine's algorithms and the searching skills of the researcher. Algorithms An algorithm is a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer. Because the results obtained from using any particular search engine is partially dependent on the searching skills of the researcher, improvement in obtaining useful results comes mainly from the increased skill of the researcher.

One of the most common criticisms of search engines is that they give you too much information. My standard search for comparison of search engines over the years has been the name of my Great-grandfather, Henry Martin Tanner. Using the name of an ancestor to compare search engines is entirely unfair because none of the online genealogical databases have the same records. Because most of the records for Henry Martin Tanner are found in Arizona, you would expect the best results from a search that emphasized records from Arizona. But if you are doing general searches online, using my Great-grandfather's name is a really good idea because I already know about many of the records I am going to find online for him.

I can quickly dispose of the test between Google and some of the other general, online search engines because, over the years, none of the other files have even come close in the depth and number of results of a comparison. For example, here is a quick comparison between Google and Microsoft's Bing search engine. First Google.


I can narrow down the search a lot by putting his name in quotation marks.


If I added some additional terms, I could come up with more or fewer results, but they would all be pertinent to the search. All of the other search engines work in about the same way. Some have more specific and complicated filtering systems but there is a law of diminishing returns where designing or programming an initial search becomes more time consuming than making multiple searches. Using multiple searches, I can start with a basic term such as a name and then add modifiers such as places, dates, etc. and generally find what I am looking for or not in few seconds.

This is where learning how to do searches comes into play. The more you search online, the more you are likely to learn what terms are important to find your target item. I usually search with Google and can find websites and other information in less time than it would take me to look up the address (URL) unless I happened to know it.

The SuperSearch™ search engine developed by MyHeritage.com is an example of a superior program but it works best when the program itself does the searches rather than the user initiating a search. The reason for this is that the automated Record Match searches look at more of the data than the user will commonly add as search terms. The search engines rely on an automated index. In the case of general search engines such as Google, the index is created by virtual robots called web crawlers or spiders that visit all of the websites and create an index entry for each. Most of the genealogy database websites rely on manual or semiautomatic indexing. In most cases, the accuracy of the searches is dependent on the accuracy of the indexers.

Overall, the genealogical search engines on most of the larger websites are very accurate with some exceptions. The most common problem is that a search cannot find a record known by the user to be in the database. This usually happens when the search term, the name, is very common. The accuracy of all search engines increases as additional search terms are added but there is always a point at which the search engine will not find a match for all of the terms added.



3 comments:

  1. Good article. Like you mentioned, a common criticism is that search engines can return too many results. To that end, I created a simple tool some time ago called "AncestorSearch: Google Custom Search" ( https://www.randymajors.com/p/ancestorsearch.html ). It uses Google Search advanced options such as quotes, checks reverse name order, word proximity, etc. Nothing that you can't do manually in Google, but the AncestorSearch search form makes it much quicker to enter. As a shortcut, just tab between fields, and hit enter (or click the "Run Full Google Search") when you're done. Hope it helps save a little time in Google searches!

    ReplyDelete
  2. Awesome article! Thank you for this article.

    ReplyDelete
  3. Thanks for sharing your "Ancestor Search: Google Custom Search" url. I tried it out and it surely gave me back many more results than just a search with my ancestors name.

    ReplyDelete