Some people eat, sleep and chew gum, I do genealogy and write...

Friday, May 11, 2018

Can Genealogists Be Replaced by a Computer Program?

Recent technological developments in the larger international genealogical community compel me to address the question raised in the title of this post once again. To answer this question, we have to go back into the history of computers to the very beginning. I will start with a quote from Ada Lovelace, usually acknowledged as the first computer programmer:
The Analytical Engine has no pretensions whatever to originate anything. It can do whatever we know how to order it to perform. It can follow analysis; but it has no power of anticipating any analytical relations or truths. Its province is to assist us to making available what we are already acquainted with.
The Analytical Engine was a machine developed by Charles Babbage beginning in 1837. Here is a description of the Analytical Engine from the Wikipedia Article: Analytical Engine.
The Analytical Engine was a proposed mechanical general-purpose computer designed by English mathematician and computer pioneer Charles Babbage.[2][3] It was first described in 1837 as the successor to Babbage's difference engine, a design for a mechanical computer.[4] The Analytical Engine incorporated an arithmetic logic unit, control flow in the form of conditional branching and loops, and integrated memory, making it the first design for a general-purpose computer that could be described in modern terms as Turing-complete.[5][6] In other words, the logical structure of the Analytical Engine was essentially the same as that which has dominated computer design in the electronic era.[3]
I am leaving in all the links in these quotes because of the technical nature of this post.

I can restate the question in the title as follows:

Is there any activity done by genealogists in compiling a sourced family tree that cannot be equally as well performed by a computer program of sufficient complexity? If a program can be designed to replace genealogists what would be its limitations if any?

My own answer to this question is that over the years, I have come to believe that nearly all the work presently done by genealogists at all levels with the possible exception of biographies, oral interviews, and other personally oriented activities could be substantially done by a complex database connected to a sufficiently sophisticated program.

How could this be accomplished? What developments would have to occur before such a system would be fully operational? The key components to such a system are already being implemented. For example, recent presentations by Gilad Japhet, the CEO of outline a method by which the present MyHeritage Record Matches will be enhanced with DNA tests and linked to sources in existing family trees to predict the common ancestor of matching DNA results. (See also
Perspectives on Combining Genealogy and Genetics)
Let's suppose that I got a Record Match from MyHeritage that said something like this:
This record matches your ancestor xxxx and you also have a DNA match with these people that have this same ancestor in their family trees. 
How much more likely would I be to give credence to the Record Match? What if the above statement added the following:
This record matches your ancestor xxxx and you also have a DNA match with these people that have this same ancestor in their family trees and this is a diagram of the ancestral line connecting you to and the other people to this probable ancestor. 
Let me illustrate another way that this might occur. Let's start with me. What basis is there for me to believe that I have identified my parents? I have an official birth certificate but there is an outside possibility that I was adopted. In my case, I remember the birth of my youngest sister and her DNA test matches me as the oldest sibling. So, the DNA test in conjunction with a substantial amount of documentary evidence creates a high degree of certitude that a specific relationship exists. The challenge here, of course, is that as we go back in time, there are fewer records and DNA tests are less accurate. Now let's suppose that I document my grandparents and DNA tests show that my research is accurate. As we continue back, the information becomes more subject to my judgment as a genealogist. But suppose that all the records I would use to discover my ancestry were digitized and online and that millions of my relatives had taken genealogical DNA tests. Wouldn't it be possible to extend my pedigree back to the limit of the available records? Couldn't these records and the DNA results be combined and produce a core pedigree with a substantial reliability?

What is lacking? The most important missing link in this projected possibility is reliable genealogically oriented handwriting recognition programs. However, important progress has and will be made in this regard. There is really no such thing as genealogical "proof." Claimed proofs are nothing more or less than educated conclusions based on a selection of sources. All genealogical conclusions are subject to the existing records and documents. Additional documentation could change almost any conclusion.

Would you trust a pedigree created by a computer program? My position is that it would be no different than what we have now. I inherited an immense pedigree going back as far as 18 generations on some lines. After 36 years of genealogical research, I have revised and corrected with thousands of sources almost every one of those inherited family lines. Much of my current progress is based on record hints from the large online genealogy companies. My family tree is now well documented back at least six generation on all my family lines. Anyone tapping into that information will already have well-documented sources and if they have a DNA test tying into any of my lines they have the basic components for making additional advances through competent research.

What is the likelihood that this will happen? Right now, the genealogical community is fragmented into different factions based on geography, commercial programs, and other factors. It is not at all likely that one company or group will "corner the market" on genealogical data or DNA test results. Right now, has the lead with over 100 million subscribers and an active DNA testing component. They also have the motivation and willpower to push the entire genealogical community towards as common family tree.

It will be interesting to see how that all happens assuming I live long enough to see the results.


  1. I would need DNA evidence. In some cases DNA may be the only convincing proof available. I started paying for YDNA tests whenever I could find a cousin who would take the test a long time ago. So far we still don't have a large enough database to prove or disprove anything. You know who does the My Heritage DNA don't you? Kaye

  2. I am afraid your conclusions fail as they are based on the supposition that all the relevant data is available to the computer program in a form it can read.
    This obstacle is the one which has been obstructing individuals for centuries; the records being preserved but unobtainable, however humans are more adapt at overcoming such obstacles than computers.

    If we disregard the above and are simply comparing a computer program analysing the same group of records as a human then things begin to fall in favour of the machine, but again it depends on many variables regarding the programming.
    The computer would have to run a program that can differentiate the difference between a parish register and a Bishop’s Transcript or trickier an original parish register and an Elizabethan copy of it or even a later copy of it. Some English parishes have as many as 5 ancient copies or more of the original register or parts of the register (many humans also fail to address this).
    Then we have to take into account Optical Character Recognition (OCR) and Handwriting Recognition Software (HWS) & Intelligent Word Recognition (IWR) etc then we are getting to the stage where computers can read text but such software is not infallible and will fail leading to false assumptions.

    Therefore the answer to your question is in certain circumstances a well designed computer program could replace a human genealogist however in real life conditions the genealogist will outperform the computer by a large factor.

    1. Well, yes and no. People aren't infallible either. I do agree with what you say about record availability, but that is a different problem than accurately using existing records.

  3. Oh, come on, James! You cannot be serious ... As well as the instances of human contact that you mention (e.g. interviews), there are many sources that have not been digitised, and there always will be. Even if you're only considering information that's online -- which equates to "genealogy" for many -- then it's still a no; compiling a family history requires an appreciation of historical context and family context. I hope that we're not simply talking about matching the context associated with discrete data items, such as dates, names, and locations. Finally, it has to be written up -- and if you're suggesting that software can write up my research in an entertaining way for human consumption then I must resort to a family initialism: ROTFLMAO!

    1. That is the problem, it is difficult to tell if I am serious or not. :-)