Pages

Friday, October 31, 2014

Analyzing Search Strategies for Genealogists

There are a number of very distinct methods for searching online. Not only does each search engine have a distinct set of algorithms, but search methodology varies for each major type of online database collection. To start out with this explanation, we need to know that genealogists are likely to use two or more different kinds of databases every time they do online research. Lack of awareness of the different types of databases and their peculiar search requirements can lead to frustration and an attitude that the Internet or a particular database "does not work," when the truth is that the researcher simply needs to adapt and use the different types of searches.

First, a few definitions.
  • search engine - a specialized type of program that works with a particular database enabling the users to find documents or entries in the database.
  • database - a collection of copies of documents or written entries dealing with a particular subject. 
  • algorithm - a set of programming sub-routines that follow a basic pattern and work together to enable a program to operate, more generally a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer.
  • methodology - a set of related steps to accomplish a particular objective.
  • string - a particular set of characters i.e. a word or name.
  • string-search - a type of search that looks for combinations of letter or characters.
I group the online databases into three major categories:
  • string-search engine based databases such as Google
  • catalogs or catalog-type databases
  • wiki or wiki-like databases
To some extent, the content of the database determines the function of the search engine or other search functions. Every time you do a search online, you need to remember that you can't get water out of a dry well. At a very minimum, you must be aware of the extent of the content of the particular database you are searching, however, this works both ways. You cannot know in advance if information about an ancestral family is in one of the collections in a particular database that you would never think of searching.

I had an interesting experience yesterday at the Family History Library in Salt Lake City, Utah that illustrates this principle. One of the patrons I was assisting was looking for Italian records. She had been unqualifiedly told by the Library volunteers that the records she was looking for were not in the FamilySearch.org Historical Record Collections or even in the FamilySearch Catalog. I spent some time with her researching the location of her ancestors' home town and then finding the churches situated near the town. Then I went to the Catholic Directory and determined the parishes and the diocese for those churches. Once we knew where to look, I found that indeed, the records were in the FamilySearch Catalog but were also digitized and in the Historical Record Collections. The records were Civil Registrations, not Catholic Church Records.

This example points out an important fact. You must do your homework and know what you are looking for before you begin an online search. In this case, finding the records involved different levels of pre-searching before you could even determine if the records were possibly in the target database. Obviously, the patron still has to search through a huge number of unindexed records in this case, but she is at least not relying on bad advice and thereby ignoring a valuable source for such a search.

Today, I had a similar experience. In this case, the patron was looking for books on German places i.e. gazetteers. She did not find them on the shelf although she found the Dewey Decimal classification numbers in the online FamilySearch Catalog. She went to the reference desk and was told that these records were in a California Family History Center. I took her back to the online catalog and found the books again and the entry said they were Reference Books. We quickly located the books on the Reference Shelf. Here the key was realizing that the reference books were not shelved with the rest of the books.

In each of the three different categories outlined above, I find that part of the success of searching is determined by how well you know the subject of your search. In genealogical searches this knowledge almost always involved knowing the exact location of an ancestral event. Of course, you can use the computer to search for locations as well as any other information.

Now to the summaries of the different types of searches.

String-search, example Google
An example of a string search is searching for a name. If I were to search for my Great-grandfather, I would be searching for the string: "Henry Martin Tanner." In this case, the capital letters are not necessary but the quotation marks tell Google that I am searching for that particular string and not the individual words. Google will still give me instances of each of the three names and any combination of those names, but the primary results will first include instances when all three names appear in the order specified. In essence, in searching in an online program such as Google (or any other search engine based program) the search entails guessing for an exact string of characters in the target website so a thorough search will involve multiple individual searches varying the words and word order. The more you search, the better your results. Practice makes you better, if not perfect.

Catalogs and Catalog type databases
The essence of a catalog is an organization of information, such as books, manuscripts, etc., based on subject matter. German books are gathered into a section of the repository, Spanish books in another etc. Although there are suggested cataloging categories, such as the Library of Congress Catalog terms, very few people, without extensive library experience, can guess  the way books and other materials are gathered in a library. Here experience with library catalogs is the best teacher. It is also common that books and other materials in a library are cataloged in different physical parts of the same library. So searching a catalog, for most people, is simply a matter of guesswork. Within a library catalog, for example, entries may be arranged alphabetically by author or even the title of the material. Neither of these simply organizational methods makes finding individual items easier if you are looking for a particular subject or in a particular location. The example here is the FamilySearch.org Catalog.

Wiki or wiki-like databases
Searching in a wiki is substantially different than searching in either of the preceding types of databases. The more specific your search, the less likely you are going be in finding what you are searching for. Wikis thrive on general searches. The program itself is designed to lead you to more specific topics. So if you were searching in the FamilySearch.org Research Wiki, for example, then you would ideally begin your search with a general term such as "United States" and then let the wiki lead you to individual states, counties and even cities. Note that you do not have to use capital letters or include the quotation marks in a wiki. The FamilySearch Research Wiki is organized in the same way that records are created and located around the world, that is by geographic location.

It looks like I need to add more to this at a later date

No comments:

Post a Comment