Some people eat, sleep and chew gum, I do genealogy and write...

Saturday, June 14, 2014

What effect will source matching technology have on the genealogical community?

There are really a number of different technologies that have the potential of effecting fundamental changes on the way genealogical research is conducted. Those technologies include the following:

  • automated searches
  • handwriting recognition
  • automated indexing
  • more rapid methods of digitization (i.e. improved scanning devices)
There are likely other technologies that will have a significant impact but, as is usually the case, we haven't seen what can be done and so are unaware of the potential impact. 

Most changes to genealogy over the past 150 years or so have been small and incremental. A hundred years ago, for example, almost all genealogically important source documents were created individually, by hand. The document creator may have used a pre-printed form, but the actual creation of the document was done by hand or typewriter. If a document was typewritten, it may have been copied from an original handwritten version. Source documents were, to a great extent, unique. There was an "original." If a copy of the original document was needed, then the copy had to be made in the same way as an original. Although there were several copy methods that had their origins in the 19th Century or even earlier, when I began practicing law, for example, we were still using carbon paper to make multiple copies. We did have mimeographic machines and spirit duplicators, but they were not suitable for making documents for submission to clients or the court. 

One technology, the thermograph, was used by my office for quite some time. Unfortunately, most of the thermographic copies were very impermanent and faded quickly to unreadability. 

For genealogists, microfilm became an important method of document reproduction. Most of the microfilm available today dates back only to the 1930s. FamilySearch's predecessor, The Genealogical Society of Utah, began microfilming records back in 1938. There is no dispute that many of the valuable genealogical records are still "locked up" in paper or microfilm copies. 

I can clearly remember my first encounter with a Xerox process duplication machine for photocopies because I spent two years in Argentina as a missionary for The Church of Jesus Christ of Latter-day Saints and when I returned, the University of Utah Library had its first Xerox machines. That would be in the Fall of 1966. Obviously, photocopy technology has had a huge impact on genealogy. I began my research in approximately 1982 by making photocopies of Family Group Records at 25 cents a copy in the Family History Library in Salt Lake City, Utah. When doing that research, I had to refer to one of the following:
  • an original document
  • a handwritten copy of an original
  • a typed copy of an original
  • a compiled index of original documents
  • a microfilmed copy of an original 
  • a photocopy of an original
There were few other options for research. Beginning in about 1975, the computer revolution began to affect genealogy. By 1983, I was working on my first personal computer and began entering genealogical data into the computer. From this point on, I could talk about all the ways genealogical research and recording of information was affected by these changes, but let's fast forward to today. 

Today we have billions of digitized documents, most of which are available in some format or system online either free or for a price. The number of digitized original source records is increasing by the millions each week. Genealogists are very fond of making a point about the number of original source records that yet need to be digitized, but that smugly held limitation is being eroded rapidly. Not only are more and more records being digitized, but the speed at which this is occurring is increasing rapidly.

So now we have huge accumulations of digital images containing reproductions of "original source records." But until very recently, the only real change that affected genealogists was the fact that the documents were sometimes more accessible. The processes of searching a digitized document is exactly the same as the ones I used in searching paper documents in the Family History Library 30+ years ago. 

Now, there are several emerging technologies that may change this time-test method of obtaining genealogical information. In my list above, the first and most evident is indexing. Presently, almost all indexing is done manually, one record at a time, by human indexers. But as the number of indexes increases, the tedious process of searching through records one-by-one is diminished, not eliminated, but severely curtailed. What would affect indexing greatly? Automating the process. How could that be accomplished? Through the development of accurate handwriting recognition software. Already Optical Character Recognition (OCR) software is impacting the availability of search technologies including indexing of typewritten or printed documents, but handwriting recognition is still the "holy grail" of genealogical research. 

So the number of available original documents will continue to increase rapidly as the digitization processes improve and handwriting recognition or more efficient indexing processes are developed. 

But now there is another technological development that is impacting genealogy in a fundamental way. That is the application of advanced automated search programs for documents that have already been digitized and are readable by OCR or handwriting recognition. This particular development has the potential of making a more radical change in the actual process of doing genealogy than the other technologies that only affect the availability of the documents. The reason for this change is simple to understand. 

Let's suppose I was searching a U.S. Census for a specific Census record. Historically, the only way I could do this was by identifying the geographic area where my ancestors lived and then searching the record for that location and then going through the records page-by-page until I found the entries I was seeking. By the way, this is still a common way to proceed. But let's imagine that the newly developing technologies come into play. First, I would have an index of the entire U.S. Census (which we do now have). Second, I could locate my ancestors' record by searching the index. Of course, I am relying on the accuracy of both the original record and the index. 

How does this change with the new search techniques? Yes, they still rely on accurate indexes, but now, I do not do the search. A computer program, running on the Internet, does the search for me with an accuracy that approaches 100%. Notwithstanding the accuracy of the search, we are still at the mercy of the accuracy of the original record and/or the index or OCR or whatever, but for a vast number of original records, my need as a genealogist to search records for sources is eliminated entirely. 

If this sounds like an unobtainable goal, I can only say that I have been using this technology now for about a year and a half and it is truly revolutionary. Will the need to examine records page-by-page ever be completely eliminated? Not likely, but the impact of automated searches is just now beginning. Ask me how it is progressing in about five years and I will tell you of the tremendous changes that will have taken place. 

The effect of the new search technology is far-reaching. Essentially, basic searches for commonly available documents becomes a thing of the past. We no longer sit down with new genealogists and have them search for their ancestors in the U.S. Census or comparable records, we simply ask them to enter what they know into an online database with automated features, and let the programs find all the more available records. Think about it. 

No comments:

Post a Comment