RootsTech 2015

Some people eat, sleep and chew gum, I do genealogy and write...

Friday, August 22, 2014

The Limits of Indexing

Indexed genealogy records are a major boon to the entire genealogical community. It is unfortunate, but many inexperienced researchers fail to take into account the fact that not all of the available records are indexed and further, that there are limits on the accuracy of any indexing project whether volunteer or commercially funded. A good example of one major limitation are all of the unindexed images on FamilySearch.org. It is likely that many of these images will remain unindexed for a considerable period of time despite the efforts of tens of thousands of volunteer indexers. With our present technology, the unindexed records, or images, are only searchable by the computer programs if they subsequently individually indexed.

I have heard many people comment that they "searched on FamilySearch.org" and did not find anything. Most of these people have no idea that their search only looked at the indexed records. There were likely other unindexed records that would have information about their ancestors. If the collections in the FamilySearch.org Historical Record Collections do not have a number of images indexed and just say "browse images" then the records are only available for a record-by-record search. It is also important to note the total number of images as opposed to the number indexed. It is entirely possible that only a small number of the total images have been indexed and any search by using the program will be incomplete.

Here is a screen shot with arrows showing the unindexed records recently uploaded to the Historical Record Collections:


Yes, all of the arrows point to Collections that are only images and are waiting to be indexed.

Let's suppose that the lack of indexes was not a problem. But what about the indexes themselves. We are dealing with people and that means that the process of indexing is imperfect. All or any of the following could be the case:
  • The handwriting on the record is illegible 
  • The records are so old and faded that they cannot be read
  • The people who are indexing the records are unfamiliar with the handwriting and the language
  • Only part of the information on the record is indexed 
  • Human error in the transcription process
The list could go on and on. If you use a search engine that relies on a transcribed and indexed record, can you really be sure that what you are looking for is not there? The answer to that question is definitely no. You can never be sure, even if you search the record yourself. That is why I am not an overly avid supporter of Research Logs. If you think you have already searched a record by looking in an index, then you are almost certainly going to be wrong some time in your research. 

Oh, there is another trap in the FamilySearch Historical Record Collections. The records may only be partially indexed but unless you compare the total number of images to the actual number indexed you might be led to believe you have searched the records when you have not. For example, here is a screenshot showing that Mexico, San Luis Potosi, Civil Registration, 1859 - 2000 has only 3,967 records indexed.



But a quick check of the records shows the following:


There are 1,896,240 images. That means that nearly all of these images are waiting to be indexed and must be searched record by record. 

Think the next time you rely on an index. 


3 comments:

  1. Re partially indexed collections - It would be really helpful if the "updated" flags at both ancestry and FamilySearch linked to more descriptive revision histories.

    ReplyDelete
    Replies
    1. Yes, I agree. It would be nice if the partially indexed records indicated that the number indexed was partial and not the complete set of records.

      Delete
  2. Very helpful, thank you! I got discouraged more than once looking at the number of indexed records for a collection on Familysearch, now I know better!

    ReplyDelete