Saturday, August 4, 2018

Augmenting Human Intellect and Genealogy

Artificial intelligence (AI) as applied to computer systems has been the topic of extensive research for many years. It is inevitable that some aspects of genealogy have been and will be affected by AI. However, rather than replacing humans and automating genealogy, most AI research today is aimed at a way of augmenting human activities or intellect. So what are the areas of genealogical research that can be augmented?

To understand what is happening now and what may be effects of AI in the near future, we need to understand what areas of the methodology involved in tracking down one's ancestors and relatives could be enhanced or accelerated by the application of programs utilizing AI. If we examine the basic functions of genealogical research, we can see those areas presently affected by AI and those areas that will be affected.

AI is defined as the study of "intelligent agents": any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. Genealogical research involves several distinct functions. These can be characterized as follows:

  • Searching or reading historical documents with the goal of finding information about ancestral relationships
  • Analyzing the information obtained and determining its application to the extension of a relationship structure, i.e. extending a pedigree
  • Recording the information found in a way that includes the exact location where the information was obtained
  • Organizing the information in a way that allows others to understand the relationships discovered
  • Comparing or sharing the information added to the corpus in order to allow others to view and utilize the information obtained
  • Determine if the information is already present (duplicate record detection)
  • Communicating the information in a way that allows others to take advantage of the work without duplicating the search (duplicate work detection)
  • Connecting the information obtained to the existing information in a way that allows continued research
If you think about these and other possible functions of the genealogical research process, you can see that some, if not all of them, have already been measurably affected by intelligent computer programs. There are, however, some gaps that reflect some of the more difficult problems that remain yet unsolved. 

For example, optical character recognition technology allows a computer program to read some digitized text. Then search programs such as the current "record hint" technology provide suggested relationships expressed by the OCR text. These programs replace the need to manually transcribe the text, but some of the record hint technology is still dependent on manual indexing of the records by extracting specifically selected elements. These limitations are imposed by the idiosyncratic nature of the content and arrangement of information in historical documents. The ultimate existing limitation of text recognition is the inability of computer programs to efficiently recognize the content of handwritten historical documents. Although character recognition has made great strides, the parsing of the text within the documents is still an obstacle. This can be done with standardized entries with specifically identified information such as an address on an envelope or entries in a census form but becomes a major challenge with documents that lack formal structure such as letters, obituaries, and other handwritten documents. 

Another example comes from utilizing the current record hint technology. Although with indexed documents, the accuracy of such hints is very high, there is still a significant need for manual review of the hints to assure that they apply to the appropriate individuals. 

Record entry, especially when there is repetitious information can be measurably increased with automated entry suggestions. The danger here is that automatic information is entered when the suggestion is actually inappropriate.

Organizing masses of genealogical data has always been a huge challenge. However, the advent of large, unified, collaborative online family trees has measurably decreased the need for individual storage. 

The other aspects of the genealogical research process such as duplicate detection, communication, watching for changes in individual records and connecting relationships are semi-automated but still subject to improvement. 

Genealogy programs are becoming "smarter" all the time, but there is still a substantial need for individual human intervention and that state of affairs is not likely to change in the near future.


  1. Good thoughts, as usual. As you stated, FamilySearch is already using AI for things such as record hinting and possible duplicates. I am now on a team that is expanding the uses of AI technologies to some of the things you referenced. Just last week I gave a presentation at the BYU Conference on Family History and Genealogy on this topic. See for the presentation slides.

    1. Thanks for sending me copies of the presentation handouts. I am looking forward to talking with you at your convenience.