Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, October 19, 2023

Challenges of the FamilySearch.org Family Tree, Now and in the Future


For whatever reasons, both the FamilySearch.org Family Tree and the entire website face some serious challenges now and in the future. These challenges can be divided into two separate but related general categories: technological changes and data related issues. For the purpose of this post, I am not including issues that arise solely in the context of the temple ordinances performed by members of The Church of Jesus Christ of Latter-day Saints. 

The first and most serious challenge is data related and can be summarized by the old computer admonition: "garbage in - garbage out." The issue is how to prevent the Family Tree from having so many unsupported, inaccurate, and duplicated entries that it becomes so unreliable and full of errors as to be unusable. This issue arises in the dichotomy between encouraging new users to enter their basic family information and the need to put some reasonable controls on both the format and content of all "new" entries. Behind this particular issue is the ever-present problem of duplication of effort which I will explain next. 

Genealogical duplication occurs at two levels; when a new individual is added when that individual is already present in the Family Tree and when research is done by those who do not use the Family Tree to determine if the information they are researching is already available and documented in the Family Tree. Let me give an example of each of these duplication issues. 

The most common cause of duplication in the Family Tree occurs when a person who is unaware of or ignoring possible duplicates adds a name to a family or adds an entire family that is already recorded in the Family Tree. In many cases the new duplicate entry lacks supporting information such as dates and places. Because of the lack of complete information, the FamilySearch search program will not identify the new entry as a duplicate. Of course, the website can look for duplicates but the system as it now exists, often fails to "see" that the newly added individual or family is a duplicate entry until some additional information about the new individual or family is added by other users' research. 

Here is one way this duplication can occur. Let's suppose I add a name such as "John Smith" from my own personal records with limited supporting information such as that he was born in "about 1800" in the "United States." There is an good possibility that the "John Smith" I am entering will be a duplicate, but there is no way for either the person entering the information or for the computer program to determine which of the thousands of John Smiths are the duplicate or duplicates. When this happens, the program can offer possible duplicates when the user submits the limited information, but because the user does not know who their person is and cannot match the name to an existing entry, the user likely chooses to create a new person. This works fine if the user goes on to do additional research, finds the duplicate or duplicates and merges the entry. However, this is not the case when the user does not know how to do the subsequent research or is ignorant or avoiding the duplicate possibility.  Unfortunately, the website is designed to accept vague entries such as the about 1800 and United States entries in my example above. 

From my own experience, this problem of initial duplication is extensive in Latin America and other areas where the Family Tree has a large number of "new" users who are adding information about their immediate ancestors but ether choosing to ignore the suggested duplicates because they don't know what to do about them or because they think that they are creating "their own" family tree. This issue can be resolved to some extent by education as I will explain below.

Duplication becomes a more serious issue when the person entering "new" information is extracting individuals or adding families from census or other records without systematically verifying family connections. A prime example of this is the early extraction program in England where baptism, marriages, and burial records were individually extracted and showed up as duplicate individuals in the Family Tree; with three or more for each person entered. These duplicates are still being found regularly by researchers. What is not surprising about these duplicates is not only are they common, but ongoing individual and institutional extraction programs are currently adding hundreds of thousands of duplicates. 

Another example of the wholesale addition of duplicates comes from allowing old and new GEDCOM data to be added directly to the Family Tree. There are some people who deny that this is happening but experienced researchers who are watching their own entries find this occurring regularly. Those who are adding the entries do not look for duplicates and assume that they can add their "own" information to the Family Tree. Currently, the process for adding entries to the Family Tree from a GEDCOM file require the user to review whether or not the entries are duplicates but some users ignore the process and mark all their entries as new and thereby flood the Family Tree with up to thousands of duplicate entries. 

I could go on practically indefinitely about the duplicate issue, but I think that I have given enough examples to illustrate the problem. This brings up one of the other major issues and one that contributes to the duplicate issue which I mentioned above. This is the issue of entry level training or education. Although the FamilySearch offers several different pathways to learning about the Family Tree, users can always choose to skip the training and start adding names directly. There are presently no requirements to learn anything all all about the website before entering information into the Family Tree. You can enter names into the Family Tree with nothing more than a name. The website will note that dates and places are missing, but still allows the name to be entered. Why does FamilySearch resist the need to train people how to use the website before making entries?

I must digress here to explain why I use adding "just a name." This occurs when I am entering information from another research source such as Ancestry.com. I need a "place holder" in FamilySearch so that I can immediately start transferring information I already have with sources in my Ancestry.com family tree. Any name I add is always connected to a family where I am doing on-going research.

Back to training. There is no lack of training available. Again, referring to my extensive experience in helping new Spanish speaking users from Latin America and around the world, I find that they cannot use the website simply because they lack some really basic information about how to use it. Once I explain the relationship between records and entries and how to find the records they are relieved to know what to do.  The lack of available and required training is the one biggest obstacle to these new users having a discovery experience. Over the years of working on and helping to develop websites, I have found that adding some "Getting Started" buttons does not work when the user is supposed to know how to properly enter names, dates, and places. Warning messages that you haven't done a certain task correctly are useless unless the website provides the information to properly enter the information. Failing to have some introductory information and notice of the standards for entering information guarantees garbage in and garbage out. 

What else? The technology that is called artificial intelligence has recently progressed to the point where the FamilySearch.org website is simply old and out of date. There is no reason now that new and experienced users could not enter valid information using a conversational interface. The website should be asking users what they want to do (and adding an option for experienced users to opt out of everything except data entry and correction.) This technology already exists. The technology to construct family trees from valid sources with more accuracy that almost all potential users also exists but there is apparently a perceived FamilySearch problem that this will end up cutting out the user in the data entry experience. It is interesting that worrying about a new user entering his or her personal family information takes precedence over the accuracy of the entire website. It is possible for anyone beginning to use the website to find that information about their dead ancestors is already in the Family Tree. I commonly find that for Spanish speaking users who are struggling to find an ancestor, someone else in their family has already entered the information they are looking for and, as illustrated above, the new user is duplicating the research. The Family Tree is supposed to be universal so why should any new user be forced to rediscover their part of the universal family tree by doing duplicate research? An AI interface could help this new user have a good experience without later finding out all the work had already been done. 

Moving on, record hints are helpful but new users, without training, do not know how or why to use them. With an AI interface, the website could simply say, "I see you have record hints, do you need help in adding sources to your part of the Family Tree?" Why not have the website itself help to keep the information accurate instead of leaving it to experienced genealogists to waste their research time correcting the bad entries of others. 

The issue of researchers who duplicate research that is already in the Family Tree occurs both through lack of knowledge about the existence of the FamilySearch.org Family Tree and specific avoidance of using the Family Tree because of its existing reputation for inaccuracy and duplication. One fear of using artificial intelligence is that people will be replaced and lose their jobs. Those of us who are now spending an inordinate amount of time maintaining and correcting the Family Tree do not need this job. We would gladly turn it over to AI unless, of course, FamilySearch wants to start paying us to maintain the Family Tree then we might also worry about losing our jobs. 

I have a lot more to say about these issues and will probably keep writing until I pass on to whatever reward I get for doing all this work in the first place. 

2 comments:

  1. The FamilySearch Tree is already un-usable. I'm not sure anything can be done to fix this short of starting over.

    ReplyDelete