Some people eat, sleep and chew gum, I do genealogy and write...

Monday, July 28, 2014

Web Basics for Genealogists -- String Searches

A string, in computer language, is a sequence of characters. If you need an example of this, try searching for random sets of characters on Google. You will soon see that the Google search engine (as do many others) will find any set of characters in any text online. For a further example, here is a screenshot of a search on "xyz123:"

This result should suggest that using Google to do searches for names and places for genealogical research would be profitable. In fact, it is. I usually suggest that people look for names and places and any other information about their ancestors. Since the number of websites for any given search is effectively infinite (no one has the time to look at every single results of a general search), you never know what you will find. You search for names by putting the search terms in quotation marks, like this: "John Doe." You can add qualifying terms such as a location, occupation etc. There is no need to put in a "+" sign between terms. Google assumes concatenation.

In a previous post, I discussed searching in a catalog. The differences between a catalog search and a string search are significant. A catalog is an arbitrary scheme of organization into categories. Further organization is usually accomplished by organizing the items in alphanumeric order. Online catalogs are usually a hybrid between a string search and subject headings. It is sometimes difficult to tell whether or not the string search applies to the entire catalog or only to the presently selected catalog area.

Some people assume that a string search is automatically superior to any possible catalog system. Unfortunately, this is not the case. Successful string searches depend entirely on ability of the researcher to "guess" words or character strings (mixtures of words and numbers) that can be found in the target document. In addition, general searches, such as searching for a very common name, can return an overwhelming number of results. The ability to guess the right search terms is a skill that is acquired by searching over an over again.

To illustrate the difference between a string search and a catalog search, I will use a hypothetical search for an ancestor with with a relatively common name. Let's assume that you have identified at least one place where an event occurred in your ancestor's life. This step is necessary because otherwise either type of search, a string search or a catalog search, will be unproductive. Without the anchor of a geographic location, it is nearly impossible to distinguish between individuals of the same or very similar name. OK. so now you start your search. The catalog search will likely produce a series of documents or collections of documents that relate to the place you identified. You will then search in the individual collections for your ancestor.

Now, let's suppose that you simple use a string search, without the benefit of the organization of documents imposed by a cataloging system. You are at the mercy of the documents. If the any document has the name of the individual you are searching for and the place, then it could be found by a string search. You should also remember that the string search will not find the content of images, whereas a catalog system may identify documents that are only available as images. Are all the contents of the catalogs subject to string searches? Unfortunately, not. Most entities such as libraries, do not allow access to their data storage to Google or anyone else.

Is this an either/or situation? Not really. A reasonably comprehensive search or exhaustive search would require a search online with a search engine such as Google, but would also require a search of the contents of any relevant repositories. Sometimes neither method is adequate and it is necessary to do a physical or manual search of the contents of any given repository. For example, if I am researching at the Family History Library in Salt Lake City, Utah, I will commonly go to the shelves and look at every single item on the shelves for an entire state to make sure I am not missing something that may contain the information I am seeking.

It is also important to remember that much of the world's genealogical information is not yet online or is still locked up in images and that there is no substitute for pursuing research on location.

No comments:

Post a Comment