Sunday, November 29, 2015

Buying Into the Revolution -- Part Three -- Where are we now?

In my last post in this series, I mention the insular nature of genealogical software. Years ago, if I were researching paper books in a library and found information about my family, I would have to copy out the information, record a citation to the location, evaluate the information and then incorporate the findings into my records, usually on a family group record. If I were using some type of file system, I might include the notes I made in that family's file folder or notebook or whatever. Years later, if I stumbled across the same information again, in another book or record, I would have go through the same process again, copy, evaluate, then I might discover that I already knew that particular item if I managed to put my notes into the same folder.

I could have used a card file for the notes and then I would be dependent on my ability to characterize the information about my family in a way that was consistent with my earlier characterization or I would have two cards with the same information. Guess what? This happens all the time in libraries across the world. Libraries depend on their catalogers to characterize and organize vast quantities of information so it can later be found and they can prevent the same material to be duplicated in different parts of the library unintentionally.

Genealogists have used all sorts of schemes to prevent this type of duplication. What is even more disconcerting is the fact that individual researchers duplicate a great deal of the same research. Some people estimate that 80% or more of what genealogists do is mere duplication of what someone else has already done. I would place that figure much higher.

The effect of this insularity is that most of what we do as genealogists is wasted effort. The research has already been done, we just do not know where to find it. This issue also arises due to the possessive nature of the average genealogist. The idea that someone can own information creates more problems than just duplication, but the concept and belief in ownership is one of the causes of the insularity.

What is truly amazing is that despite all the duplication, there are still vast areas of genealogical research that remain unorganized and are not incorporated in any kind of organization.'s Family Tree and other similar efforts to consolidate genealogical research are mired down in inaccurate and incomplete information. From my own experience, correcting even one small branch of these massive trees is a monumental undertaking.

So where are we in our efforts to eliminate the insularity? On a scale of one to ten with ten being the goal, we have not made it to one yet. What is preventing our progress? At the core of the problem is our inability to easily and completely transfer genealogical data from one venue to another. Simply put, if I have spent years accumulating information about one of my ancestors, there is no adequate way for me to incorporate all of my information into one place that makes it available to every other person related to the same ancestor. has made a start by creating a unified family tree program, but after years of working on the Family Tree, we are still waiting to have a way to eliminate duplicate entries, even when those entries are obvious duplicates.

If you want to get some idea of the overall scope of the problem, you can look at my family tree in the program. uses advanced programming techniques to match the entries in my family tree with the family trees of all of the other users of the program. Right now, the program is telling me that I have over 100,000 matches. Just imagine the duplication that this figure implies! The details of the matches say that I have 8,815 matching family trees with 60,244 Smart Matches, not quite 100,000 but who is counting. I have no mechanism at all that will allow me to examine all those potential matches and this is only one program. What about all the trees in all the other programs? Each of those matches is a potential duplication of effort.

Perhaps these examples give you an idea of the challenge faced by the genealogical community. At this point all the technological advances we have available to us have only given us a window to look out on the scope of the problems we face. We have only made the barest beginnings at solving those problems.

Genealogy deals with a specialized type of information, but it is information. We are supposed to be in the information age. So what are we doing with our massive amounts of information. Here is one more example.

My parents are cousins. They share a common ancestor. My mother's great-grandfather was also my father's great-great-grandfather. The same person is in both family lines. This is called pedigree collapse. None of the present genealogical database programs convey this information in an adequate way. Nothing tells me that by going back in two different lines, I will ultimately be looking at the same people. I happen to know about this particular problem and others, because I recognized the names, but what happens when you get back a few more generations? Can you really keep track of all of the common relationships? The answer is that presently we have no programs or mechanisms that identify common ancestors in the same programs where that information is stored. Yes, I have a utility program that will tell me if I am related to someone and yes the programs will tell me how I am related to another person in the same program, but none of them show me where my lines merge. For example, there is a program called Relative Finder that works with the Family Tree. According to that program, my wife and I are related as 6th cousins 2 times removed. We supposedly share a common ancestor back in the late 1600s. However, absent the suggested link from Relative Finder, we might never had known of that connection. We are only, just now, getting to the point where we can recognize such connections.

