Pages

Tuesday, August 25, 2015

More about the surprising relationship between MyHeritage and the FamilySearch Family Tree

Note: This particular post is based on some recent observations reported in a previous post entitled "Some Unexpected Information from MyHeritage.com." What I have to say in this present post elaborates on my observations over the past few years and explains my opinion of what is going on with both MyHeritage.com and the FamilySearch.org Family Tree. Please understand that MyHeritage.com is functioning exactly as it is designed to function. The problems and issues I discuss can only be attributed to the reality of the FamilySearch.org Family Tree. MyHeritage.com is merely the messenger and it is only because it works so well that the message is coming through.

Now a fairly brief summary of the history of the data in the FamilySearch.org Family Tree.

Over the past 100 years or so, many people have contributed their user compiled data to FamilySearch and its predecessors: The Genealogical Society of Utah and various other entities and organizations. Eventually, all this information, usually in the form of Family Group Records and pedigrees, was compiled into huge databases containing many millions of records of individuals and families. These databases included the Ancestral File, the Pedigree Resource File, the International Genealogical Index, The membership records of The Church of Jesus Christ of Latter-day Saints and the records of the names submitted to the LDS Temples. In an attempt to search all of these records simultaneously, these separately submitted records were all combined into one database that was used as the basis for the program called "new.FamilySearch.org." (See various notes similar to the one in this FamilySearch.org Research Wiki article).

Some of the individuals recorded in the final compilation of all these previously submitted records had multiple records. Over the years, the individual records of people with thousands of descendants in the Church, had been submitted hundreds (thousands?) of times. For example, many of my ancestors were represented by multiple entries (duplicate entries) in the combined database. In an early attempt to minimize that duplication, the new.FamilySearch.org program "combined" the duplicates into one composite individual. The existence of these "combined records" turned out to be a difficult concept for some of the users of the new.FamilySearch.org program. The automatic combining process was imperfect and some individuals who were not the same were mistakenly combined. On the other hand, many duplicated records were ignored by the program and, in addition, subsequent users of new.FamilySearch.org submitted even more duplicates of the same individuals.

The results of this combining effort and the subsequent submission of "new" but duplicate records to the new.FamilySearch.org program resulted is even more duplicate records. Many of these duplicate records were not isolated submissions. Users of the new.FamilySearch.org could upload their entire files by using the GEDCOM submission process, thus resulting in entire pedigrees being duplicated again and again. This same database, containing all of these duplicate records, became the basis for the FamilySearch.org Family Tree program [my references to the Family Tree (capitalized) is to this program only].

The Family Tree inherited all of the duplicates. This was not a bad thing. It was, in fact, the only way to avoid future duplication. We had to face the fact that all these duplicates had been created by the submission system over the 100+ years it had been operating. THE FAMILY TREE IS THE SOLUTION TO THE PROBLEM, NOT THE PROBLEM. The Family Tree program provided a way to "merge" the duplicate records. Merging the records would ultimately eliminate the vast bulk of the duplicates. There is one huge limitation in the data however. The Family Tree was using the same database created for new.FamilySearch.org and that imposed some limitations on the information. One of those limitations is that there was an absolute limit on the number of records that could be merged. The unfortunate result of this fact was that there were still a huge number of un-merged, duplicate records in the Family Tree database that could not be merged at all. See this screenshot for an example:


Now, what does this mean? It means that there are duplicate records that are out there and until the limitation on merging these records is complete, some of the information about this individual is fragmented and any one copy of the individual's record may be incomplete or inaccurate. Deleting the duplicate records is not a viable option for too many reasons to be addressed in this post. Basically, deleting records results in a possible loss of valuable information that cannot be easily retrieved. There are several reasons why the records cannot be merged by the FamilySearch.org Family Tree. Knowing the reasons a record cannot be merged is interesting but not helpful to resolving the problem. Essentially, this can only be done by FamilySearch.

Looking at the above record in the Family Tree, it is evident that not only is there a duplicate that cannot be combined, there is also another duplicate record that the merge function cannot find. Here is a screenshot of the family showing three duplicate records for this individual, not just two.


There are, in effect, "hidden" duplicates in the program that the search engine of the Family Tree cannot or does not find. Here is the results of a search on the name "Calvin Christensen Morgan."


I have written about this issue many times over the past few years but it is still a problem and until all of the data and whatever else is necessary to clean up that data occurs and the new.FamilySearch.org program is finally put to rest, it will remain a problem. We have been told many times by FamilySearch that this merging problem (read unresolvable duplication problem) cannot be resolved until the process of moving the data to the Family Tree is complete. There is presently no firm deadline for the completion of this process.

Now, what has this got to do with MyHeritage.com? Really nothing directly. MyHeritage.com just happens to have a search engine that works and does not ignore all the duplicates. Let me move to a more serious example of the problem. I will go back to my New England ancestor, Nathaniel Potter. According to my own records, Nathaniel Potter was born in 1637 in Rhode Island and died on 20 October 1704. Here is a screenshot of the record from the Family Tree:


The important thing to note here is the Personal Identification Number (PID): 9MK1-NZT. Each individual in the Family Tree is supposed to have a unique PID. OK, so what happens if we search for duplicates for this Nathaniel Potter in the Family Tree?


Note that there are 75 results. But this is not the whole story. If we drop down to the bottom of this screen, we see the following:


There are 24 additional results that cannot be merged at this time. Oh, which one of these do I have in my Family Tree? Are all these the same person?



This is the entry for Nathaniel Potter that I get if I click back through my lines as shown on the Family Tree. Is this the same Nathaniel Potter? No. Note the PID of KN42-LSZ. There are three "Nathaniel Potters" in my own database. My records show my direct line ancestor was born in 1615 and was married to Dorothy Wilbur. The record in the Family Tree shows this Nathaniel Potter as being "Read Only" and having 17 children named Nathaniel Potter. See the following screenshots:


Here is the top part of that same screenshot:


The read only designation means I cannot make any changes to this individual's record, but this happens to be the "wrong" Nathaniel Potter judging from my own records. Any relationship calculated from this individual would not be accurate.

This would be of passing interest were this situation unique or rare in the Family Tree, but what has happened is that this type of individual, with a multitude of descendants, always has the same problems.

Now where does MyHeritage.com come into this picture? One of the functions of the MyHeritage.com program is to search for Record Matches, that is records that match the people in my family tree on that program. As I mentioned previously, I have 14,420 Record Matches waiting for my confirmation and inclusion in my family tree on MyHeritage.com. The search capabilities of the MyHeritage.com program are overwhelmingly impressive. Because of the partnership between MyHeritage.com and the FamilySearch.org program, MyHeritage.com's Record Matches now search the Family Tree entries. Here is a screenshot sorted by people showing the Record Matches with the individuals with the most matches on top:


Nathaniel Potter just happens to be at the top of this list with 247 matches. This means that he has that number of potential sources. What I previously pointed out as surprising is that nearly all of these "matches" are to the FamilySearch.org Family Tree. Here is what the first part of the list looks like if I review the matches:


The entry at the bottom shows the first of many entries linked to the Family Tree. In fact there are dozens and dozens of entries.


This is not a problem with MyHeritage.com. It is only doing its job extraordinarily well. It has found the duplicate entries in the Family Tree program. But what is more, it has found not just the few found by FamilySearch, but many, many more. When I checked the PID of one of the entries, the MyHeritage.com Record Detective search showed even more duplicates:


There are 246 Record Detective results, almost all of which are entries in the Family Tree; roughly two times the number found by FamilySearch. In other words, there are roughly a hundred or more additional duplicate entries in the Family Tree that are not found by searching with the FamilySearch.org program. I say "roughly" because the actual number is not ascertainable.

What does all this mean? One conclusion is that once you encounter this issue in your lines on the Family Tree, there is not a whole lot you can do about it presently. Will all these hundreds of duplicates be eventually merged into one individual? Will users add dozens or hundreds more copies of Nathaniel Potter before the program is fixed? I cannot answer these or many other similar questions.

On the other hand, if I only use the first six generations or so of the Family Tree, then the information is fairly accurate. As a side note, many of the duplicate copies of Nathaniel Potter show ordinances reserved and printed. People are still adding duplicate individuals to the program. I also found green icons for Nathaniel Potter allowing the Temple work to be duplicated yet again.

Here is a screenshot showing another search for duplicates, for the same Nathaniel Potter KN42-LSZ, this time with only five results, including one allowing the Temple work to be done again.


You might notice that this "Nathaniel Potter" has no sources, no date for birth, death or any other information.

What is happening here? MyHeritage.com apparently has a more complete and expansive search capability than FamilySearch. It has found many more duplicates than are found in searches using the tools on FamilySearch.org. The issue of "accuracy" is a red herring. Searches on both MyHeritage.com and the FamilySearch.org Family Tree are "accurate." The issue is not accuracy, but completeness. Obviously, the MyHeritage.com Record Matches and Record Detective find more complete information than FamilySearch. In this particular case, the fact that the searches in MyHeritage.com graphically showed the number of duplicates in FamilySearch was a surprise. I would guess that neither FamilySearch, nor MyHeritage.com were aware of what would happen when MyHeritage.com searched the Family Tree.

I could speculate as to the reasons why MyHeritage.com finds more information than FamilySearch, but that would not be helpful.

What do we do about all this? Nothing. We wait until FamilySearch says they have fixed the issues remaining from new.FamilySearch.org which are implicit in the data. This is not FamilySearch's fault. It is the reality of the data inherited from 100+ years of duplicate work. What does this mean to some of the users of the Family Tree. I can summarize this as follows:

  • Many entries of individuals with a number of descendants in the Family Tree are duplicated
  • The duplicates mean that the particular information showing in your own lines may be inaccurate or incomplete
  • The availability of green icons does not mean that the work has not already been done
  • There is no present way for users to "fix" the entries completely
  • If you use MyHeritage.com, you can tell that any given ancestor has the problem by looking at the number of multiple links to the Family Tree for any that individual. 


13 comments:

  1. Nathaniel Potter: the George Foreman of the 17th century.

    Is FamilySearch planning on notifying people when NFS is closed? I'm the director of a Family History Center, signed up for all the newsletters and emails, trying my hardest to find help from FamilySearch for even the simplest of problems (and often not succeeding), and I'm not hearing this kind of vital news.

    ReplyDelete
    Replies
    1. My best guess is that when they are finally through moving all the data from new.FamilySearch.org, no one will really know for a while. It might take some time to actually verify that they are done. No, I don't expect an announcement until it has been done for some time.

      Delete
  2. One more comment, something you may or may not care to know: the Potter family was closely related to the DeWolf family, and included some of the largest slave traders and privateers in colonial America.

    The omission of this rather major part of history from Rhode Island family and community histories is a subject of marvel to contemporary historians.

    One quick example (just one of the first results in Google, not the best source), "Before beginning the next section concerning Rhode Island's and the DeWolf family's connection with the Slave trade; it should be noted that in chapter 14, (Bristol) of "The Providence Plantations for 250 years" written by Welcome Arnold Green in 1886, there is not one mention of the institution of slavery..." (http://thesaltysailor.com/rhodeisland-philatelic/rhodeisland/stampless66.htm)

    So, Rhode Island history is complicated, and so is the genealogy, even without adding in all of the quirks of the online tree system.

    ReplyDelete
    Replies
    1. Thanks for the insight. I am still concentrating on the first six or seven generations.

      Delete
  3. Thanks James. I hope everyone takes the time to read this, then sits back an exercises patience. I have faith that the family search team has fixing this limitation as a priority.

    I appreciate your efforts to help us work within the available resources.

    ReplyDelete
  4. "As a side note, many of the duplicate copies of Nathaniel Potter show ordinances reserved and printed. People are still adding duplicate individuals to the program. I also found green icons for Nathaniel Potter allowing the Temple work to be duplicated yet again."

    That is by far the worst issue now. Something that Familysearch needs to sort out is to stop people adding even more duplicates like this. Familysearch's search capabilities are not good when compared to other genealogy websites which means that often people do not find that they have added a duplicate until after they have done it. There is also a problem with people who add "their" ancestor in the stubborn refusal to realise that it is a shared tree which is supposed to have only record for each person who has ever lived.

    I don't know how to stop people adding duplicates, but perhaps stopping people adding entries to the tree with dates before 1700 until they have a proven editing track record would be a way. Maybe that even needs to be extended up to 1800. After all everyone alive has plenty of ancestors born after 1800.

    ReplyDelete
    Replies
    1. Yes, that will be a problem for a while. However, the number of green icons is rapidly decreasing, at least in my family lines.

      Delete
  5. Kathryn Grant made a presentation that is on our website, FSFamilyTreeUserGroup.com, which FamilySearch picked up and put on the Learning Center titled "Duplicates in Family Tree, Why They're There, How to Find Them, and How to Resolve them." at this link: https://docs.google.com/presentation/d/1SRS6ApO3t5c0aylHRoiN77fKk5E2BTgTGakL9kk6Ijw/present?slide=id.p.
    It is not that FamilySearch does not have the search capacity to find all the duplicates, it is that each tool has a different depth of search on purpose. Most people use Possible Duplicates, which is deliberately shallow in its search, to prevent novice users from mis-merging (again). Find tool finds a lot more, as you indicate. Adding or Finding a New Person, Parent or Child find more. When Searching for someone else, you sometimes see more duplicates for another person. When adding a source from FS Historical records you find more. When viewing the children in a Family Section you find more. Clearing a name for temple work, looks for duplicates. Kathryn goes on to give example of duplicates which are slightly different and not found, or which look the same but are not. This is a short presentation worth looking at. Rob Kehrer's addition to FamilySearch has really helped with him and his crew inventing the Source Linker Hint. I think they use spy technology to match people, places, relationships, and so on. Now it seems that MyHeritage combines all these tools in one place and pulls them all up at once. That is good, and also bad. Until novice users can demonstrate that they know how to use restraint in merging duplicates, which aren't sourced or proved to actually be duplicates, we need to be cautious. Otherwise, we're on the same track of messing things up again.

    ReplyDelete
    Replies
    1. Yes, I know about searching in depth, but nevertheless, you would have to keep a list of all the PID numbers for each of the searches and compile a list. MyHeritage.com has done all that work for you.

      Delete
  6. Excellent discussion. I wish you could make this required reading for everyone working in Family Tree.

    ReplyDelete
  7. Does this help me in finding my family tree? Do I need to pay anything?

    ReplyDelete
    Replies
    1. Sorry, both of your questions are too general to be answered. Both depend on other issues and circumstances. Perhaps you could be more specific. :-)

      Delete