Pages

Sunday, October 21, 2012

The Duplicate Issue in FamilySearch Family Tree

One of the overriding issues with New.FamilySearch.org (NFS), the predecessor to FamilySearch.org's Family Tree (FSFT), is the abundance of duplicate submissions. Some individuals have hundreds of duplicates due to their inclusion in a multitude of user submitted family group records and electronic files. I view the duplicate issue as the one paralyzing feature of the NFS program that prevents me from using it in any meaningful way.

[I warn you in advance, I am not going to stay entirely in my analogy, I am going to jump in and out as it suits my narrative].

To understand this problem, I am going to use an analogy to a stage and the characters that appear on the stage. The program, NFS, in this first example, is the stage. The submissions are the characters that appear on the stage. In my analogy, each character can have one or even many more alter egos in the form of a "duplicate" characters. The problem with these alter egos is that they are not exact duplicates. They vary from the "original" in sometimes very important and distinctive ways. These alter egos are created by those submitting the characters to the stage. [For example, multiple submissions of the same families to the Ancestral File or Pedigree Resource File].

This multiplicity of characters would not be a problem if the differences reflected actual "real world" differences. But the variation in the alter egos are primarily the results of sloppy, inaccurate or negligent activities on the part of the submitters. Other alter egos are created due to changes in the submission standards over the years. [For example, historically, because of the space limitations on the forms used, it was acceptable to use abbreviations]. In NFS, unfortunately, all of these alter egos appeared on the stage with the primary character all of the time. In many cases the stage [the information in the NFS files] was so congested with duplicate alter egos that the main character was entirely lost in the throng. If a subsequent user of the NFS program detected an error in a submission, that error could only be corrected by adding a new alter ego to the masses already on stage.

This multiplicity of alter egos rendered the NFS program unusable for many people who would have liked to clarify the issues created by having all alter egos on the stage with the main characters.

There was no solution to the problem within the context of the NFS program. End of story.

The producers (FamilySearch's developers) decided to build a new stage, one that would let only one character at a time appear to the audience [users of the program]. In an abundance of caution, they did not kill off all of the alter egos, but they provided a way so that the audience [users] could select which of the characters appeared on the stage and further allowed changes, deletions and merging of alter egos, so that the character that appeared was the universally accepted standard and incorporated only the correct variations in the story line [data about the individuals in the program].

OK, enough of this analogy. FSFT only allows for one person with one set of descriptive characteristics to be visible at a time. All of the variations [alter egos] are banished to the background. They are not lost forever, but the people using the program don't have to deal with the differences. In effect, everyone who uses the program is forced to accept only one version of all the possible variations in a person's information. If there are legitimate differences they may be documented with the individual but only one individual can fill each slot or node on the family tree.

For me, this solves the problems I faced in NFS. If my tangled relatives can be sorted out, for the first time in the 30 years of doing genealogy, I will have a place to make some sense out of my pedigree and work out all of the thousands of duplicates and come to a consensus with others in the vast family I have out there in the world. In truth, FSFT solves the duplicate problem present in all other iterations of my family tree online and elsewhere.

Now, if this explanation of how the program works doesn't make sense to you, please submit your comments. I will make further comments if they are warranted. 

2 comments:

  1. James, thanks for your overview. You say, "FSFT only allows for one person with one set of descriptive characteristics to be visible at a time. All of the variations [alter egos] are banished to the background. They are not lost forever, but the people using the program don't have to deal with the differences. In effect, everyone who uses the program is forced to accept only one version of all the possible variations in a person's information. If there are legitimate differences they may be documented with the individual but only one individual can fill each slot or node on the family tree."

    All of the non-combined duplicates have been migrated individually to the FS-Family Tree. They can be seen as a search result. Many can be seen as duplicates married to one spouse or listed as child of one or both parents. Others are simply part of a parallel sub-tree with duplicated combination of spouse, children, and/or parents, etc.

    True, the FS-FT user can only see one person's "detail" page (with editable vitals, marriage, etc.) at a time. But if duplicates of the same person are attached to instances of the same parents they can be seen in the list of children of those parents on the aforesaid person's detail page.

    It's just if duplicate of the same person with different PID is attached to duplicates of the same parents that the duplicates can not be seen from one person/PID's detail page.

    These duplicates will still be there unless merged by a FS-FT user, and someone may actively modify any of them. The FS-FT program does not select a 'preferred' version out of the duplicates and conceal the others.

    ReplyDelete
    Replies
    1. I understand that FamilySearch is committed to eliminating the duplicates from the usable portion of the program. As these duplicates are identified, they can be merged with the main individual. They are always going to be there because you can "undo" a merge.

      Delete