Thursday, November 25, 2021

Digging Into the entire website: Digital Images, Indexes, and the Catalog (Part Seven)


As I pointed out in a recent post about, only 20% (their figure) of the images on the website are indexed. What this means is that when you go to the Historical Records Collection and search for a name, you are only searching 20% of the records. So where are the rest of the records? Answering that question takes a lot of digging into the website. 

First, a few comments about indexing. If you look at the index of a book (if there is one), you will probably see a list of terms that are in the book and the pages where those terms may be found. An index is intended to help you find information when that information is mixed up with a lot more information that you don't want to look at that moment. So, in effect, an index is a finding aid. Indexing does not answer your questions, it just helps you find some information that may answer your question (the type of question you are asking doesn't really matter). The people creating the index choose which terms in the document or book to index. The terms they use to search are now referred to as "metadata," a set of data that describes and gives information about other data, When you are "indexing" records from the website, the metadata desired has already been determined. For example, you might be asked to index the first name, middle name, surname or last name, date of birth, place of birth, and so forth. The indexing fields or terms are metadata, they provide you with information about the record, that is, you can use the index to find specific entries that match your search terms. 

Let me illustrate. Let's suppose that you wanted to find a death record for this person in a Danish record. 

Christen Jensen, b. June 1794 in Wollen, Hørby, Hjørring, Denmark. 

You could go to the Historical Record Collections on the FamilySearch Website and do search using these fields:

Note the new search page. Anyway, I have entered the search terms in the search fields (metadata established by the programmers). When I click the Search button, FamilySearch's search engine (the programming that compares your entry for each item of metadata) with the information (text) in the documents. Except there is a problem. The search engine can't read the text. The text has to be indexed (metadata terms need to be identified) and the index can then be used by search engine to find what you are looking for. Here is the results of this search using the FamilySearch index for these records. 

Hmm. I got an entire page of Christen Jensens. If I keep scrolling down, I will find hundreds of Cristen Jensens. If I focus on the place I entered, Wollen, Horby, Hjørring, Denmark, I do not see any of records in the first 100 that match my search terms. I don't suppose this ever happens to you but it happens to me all the time. He is actually Christen Jensen LH1L-TXV in the FamilySearch Family Tree and he is the first person in the list. He also has 54 sources attached. Why did I get so many false positives? (Results that closely match my search terms but are not the right person). Well, I did find the right person, I just didn't immediately recognize him as the right person. What seemed like a straight forward search has turned into a mess. Well, here is one thought. Someone wrote the name of the place as Wollen, Horby, Hjørring, Denmark but now we move to a different level: cataloging. The place "Wollen" which if it exists is probably a house, really should be written as Hørby, Dronninglund, Hjørring, Denmark, This is the way the place is described in catalogs of place names. That is assuming we have the right place and name in the first place. 

A catalog is different than an index. A catalog is an attempt to organize the information, usually individual records, books, or whatever, into manageable categories. Each of those categories are then subdivided into smaller categories. Categories work well with geographic names. For example, here is set of categories from the Catalog. 

Brazil, Mato Grosso, Barão de Melgaço

You might notice that the catalog entry is the reverse of the standard way to record place names. That is because the catalog starts out with the largest inclusive category first and then continues to subdivide the entry into smaller categories. This method allows a cataloging system to catalog a single item but also makes it difficult to understand as you try to guess the categories. 

You have probably heard of a cataloging system called the Dewey Decimal System. This organizes libraries across the United States. There are several large cataloging systems including the Library of Congress (the most difficult to understand) and some popular in Europe. 

The opposite of a catalog system is a character-by-character (individual letters and symbols) search. This is the most common way computers search. When you enter a string of characters, such as a name, a program called a search engine begins comparing the string to any similar string in the target search area. The whole process is really much more complicated but is partially what does when it searches indexed documents. It uses the index and a basis for the string search. This is also partially the reason why you get thousands of responses to a simple name search.

Now back to the 20% issue. What percentage of all the rest of the unindexed records on the website are cataloged? It wasn't very long ago that the answer would be all of them, but some time ago, FamilySearch started adding digital records online that were not cataloged or indexed. Where are these records? They are in the "Images" section of the website.  You go to the search tab at the top of the screen and click on the second item, "Images." You then enter a country and continue clicking. I will have a lot more to say about images in the next installments of this series. 

By the way, as I think about it, I could easily write a book about genealogical searching on computers, except who reads books?

Here are the previous posts in what is going to be a very long series. 

