Note: This particular post is based on some recent observations reported in a previous post entitled "Some Unexpected Information from MyHeritage.com
." What I have to say in this present post elaborates on my observations over the past few years and explains my opinion of what is going on with both MyHeritage.com
and the FamilySearch.org Family Tree
. Please understand that MyHeritage.com
is functioning exactly as it is designed to function. The problems and issues I discuss can only be attributed to the reality of the FamilySearch.org
Family Tree. MyHeritage.com
is merely the messenger and it is only because it works so well that the message is coming through.
Now a fairly brief summary of the history of the data in the FamilySearch.org Family Tree.
Over the past 100 years or so, many people have contributed their user compiled data to FamilySearch and its predecessors: The Genealogical Society of Utah
and various other entities and organizations. Eventually, all this information, usually in the form of Family Group Records and pedigrees, was compiled into huge databases containing many millions of records of individuals and families. These databases included the Ancestral File
, the Pedigree Resource File
, the International Genealogical Index
, The membership records
of The Church of Jesus Christ of Latter-day Saints
and the records of the names submitted to the LDS Temples
. In an attempt to search all of these records simultaneously, these separately submitted records were all combined into one database that was used as the basis for the program called "new.FamilySearch.org." (See various notes similar to the one in this FamilySearch.org Research Wiki article
Some of the individuals recorded in the final compilation of all these previously submitted records had multiple records. Over the years, the individual records of people with thousands of descendants in the Church, had been submitted hundreds (thousands?) of times. For example, many of my ancestors were represented by multiple entries (duplicate entries) in the combined database. In an early attempt to minimize that duplication, the new.FamilySearch.org program "combined" the duplicates into one composite individual. The existence of these "combined records" turned out to be a difficult concept for some of the users of the new.FamilySearch.org program. The automatic combining process was imperfect and some individuals who were not the same were mistakenly combined. On the other hand, many duplicated records were ignored by the program and, in addition, subsequent users of new.FamilySearch.org submitted even more duplicates of the same individuals.
The results of this combining effort and the subsequent submission of "new" but duplicate records to the new.FamilySearch.org program resulted is even more duplicate records. Many of these duplicate records were not isolated submissions. Users of the new.FamilySearch.org could upload their entire files by using the GEDCOM submission process, thus resulting in entire pedigrees being duplicated again and again. This same database, containing all of these duplicate records, became the basis for the FamilySearch.org
Family Tree program [my references to the Family Tree (capitalized) is to this program only].
The Family Tree inherited all of the duplicates. This was not a bad thing. It was, in fact, the only way to avoid future duplication. We had to face the fact that all these duplicates had been created by the submission system over the 100+ years it had been operating. THE FAMILY TREE IS THE SOLUTION TO THE PROBLEM, NOT THE PROBLEM
. The Family Tree program provided a way to "merge" the duplicate records. Merging the records would ultimately eliminate the vast bulk of the duplicates. There is one huge limitation in the data however. The Family Tree was using the same database created for new.FamilySearch.org and that imposed some limitations on the information. One of those limitations is that there was an absolute limit on the number of records that could be merged. The unfortunate result of this fact was that there were still a huge number of un-merged, duplicate records in the Family Tree database that could not be merged at all. See this screenshot for an example:
Now, what does this mean? It means that there are duplicate records that are out there and until the limitation on merging these records is complete, some of the information about this individual is fragmented and any one copy of the individual's record may be incomplete or inaccurate. Deleting the duplicate records is not a viable option for too many reasons to be addressed in this post. Basically, deleting records results in a possible loss of valuable information that cannot be easily retrieved. There are several reasons why the records cannot be merged by the FamilySearch.org Family Tree. Knowing the reasons a record cannot be merged is interesting but not helpful to resolving the problem. Essentially, this can only be done by FamilySearch.
Looking at the above record in the Family Tree, it is evident that not only is there a duplicate that cannot be combined, there is also another duplicate record that the merge function cannot find. Here is a screenshot of the family showing three duplicate records for this individual, not just two.
There are, in effect, "hidden" duplicates in the program that the search engine of the Family Tree cannot or does not find. Here is the results of a search on the name "Calvin Christensen Morgan."
I have written about this issue many times over the past few years but it is still a problem and until all of the data and whatever else is necessary to clean up that data occurs and the new.FamilySearch.org program is finally put to rest, it will remain a problem. We have been told many times by FamilySearch that this merging problem (read unresolvable duplication problem) cannot be resolved until the process of moving the data to the Family Tree is complete. There is presently no firm deadline for the completion of this process.
Now, what has this got to do with MyHeritage.com
? Really nothing directly. MyHeritage.com
just happens to have a search engine that works and does not ignore all the duplicates. Let me move to a more serious example of the problem. I will go back to my New England ancestor, Nathaniel Potter. According to my own records, Nathaniel Potter was born in 1637 in Rhode Island and died on 20 October 1704. Here is a screenshot of the record from the Family Tree:
The important thing to note here is the Personal Identification Number (PID): 9MK1-NZT. Each individual in the Family Tree is supposed to have a unique PID. OK, so what happens if we search for duplicates for this Nathaniel Potter in the Family Tree?
Note that there are 75 results. But this is not the whole story. If we drop down to the bottom of this screen, we see the following:
There are 24 additional results that cannot be merged at this time. Oh, which one of these do I have in my Family Tree? Are all these the same person?
This is the entry for Nathaniel Potter that I get if I click back through my lines as shown on the Family Tree. Is this the same Nathaniel Potter? No. Note the PID of KN42-LSZ. There are three "Nathaniel Potters" in my own database. My records show my direct line ancestor was born in 1615 and was married to Dorothy Wilbur. The record in the Family Tree shows this Nathaniel Potter as being "Read Only" and having 17 children named Nathaniel Potter. See the following screenshots:
Here is the top part of that same screenshot:
The read only designation means I cannot make any changes to this individual's record, but this happens to be the "wrong" Nathaniel Potter judging from my own records. Any relationship calculated from this individual would not be accurate.
This would be of passing interest were this situation unique or rare in the Family Tree, but what has happened is that this type of individual, with a multitude of descendants, always has the same problems.
Now where does MyHeritage.com
come into this picture? One of the functions of the MyHeritage.com
program is to search for Record Matches, that is records that match the people in my family tree on that program. As I mentioned previously, I have 14,420 Record Matches waiting for my confirmation and inclusion in my family tree on MyHeritage.com
. The search capabilities of the MyHeritage.com
program are overwhelmingly impressive. Because of the partnership between MyHeritage.com
and the FamilySearch.org
's Record Matches now search the Family Tree entries. Here is a screenshot sorted by people showing the Record Matches with the individuals with the most matches on top:
Nathaniel Potter just happens to be at the top of this list with 247 matches. This means that he has that number of potential sources. What I previously pointed out as surprising is that nearly all of these "matches" are to the FamilySearch.org Family Tree. Here is what the first part of the list looks like if I review the matches:
The entry at the bottom shows the first of many entries linked to the Family Tree. In fact there are dozens and dozens of entries.
This is not a problem with MyHeritage.com
. It is only doing its job extraordinarily well. It has found the duplicate entries in the Family Tree program. But what is more, it has found not just the few found by FamilySearch, but many, many more. When I checked the PID of one of the entries, the MyHeritage.com
Record Detective search showed even more duplicates:
There are 246 Record Detective results, almost all of which are entries in the Family Tree; roughly two times the number found by FamilySearch. In other words, there are roughly a hundred or more additional duplicate entries in the Family Tree that are not found by searching with the FamilySearch.org
program. I say "roughly" because the actual number is not ascertainable.
What does all this mean? One conclusion is that once you encounter this issue in your lines on the Family Tree, there is not a whole lot you can do about it presently. Will all these hundreds of duplicates be eventually merged into one individual? Will users add dozens or hundreds more copies of Nathaniel Potter before the program is fixed? I cannot answer these or many other similar questions.
On the other hand, if I only use the first six generations or so of the Family Tree, then the information is fairly accurate. As a side note, many of the duplicate copies of Nathaniel Potter show ordinances reserved and printed. People are still adding duplicate individuals to the program. I also found green icons for Nathaniel Potter allowing the Temple work to be duplicated yet again.
Here is a screenshot showing another search for duplicates, for the same Nathaniel Potter KN42-LSZ, this time with only five results, including one allowing the Temple work to be done again.
You might notice that this "Nathaniel Potter" has no sources, no date for birth, death or any other information.
What is happening here? MyHeritage.com
apparently has a more complete and expansive search capability than FamilySearch. It has found many more duplicates than are found in searches using the tools on FamilySearch.org
. The issue of "accuracy" is a red herring. Searches on both MyHeritage.com
and the FamilySearch.org
Family Tree are "accurate." The issue is not accuracy, but completeness. Obviously, the MyHeritage.com
Record Matches and Record Detective find more complete information than FamilySearch. In this particular case, the fact that the searches in MyHeritage.com
graphically showed the number of duplicates in FamilySearch was a surprise. I would guess that neither FamilySearch, nor MyHeritage.com
were aware of what would happen when MyHeritage.com
searched the Family Tree.
I could speculate as to the reasons why MyHeritage.com
finds more information than FamilySearch, but that would not be helpful.
What do we do about all this? Nothing. We wait until FamilySearch says they have fixed the issues remaining from new.FamilySearch.org which are implicit in the data. This is not FamilySearch's fault. It is the reality of the data inherited from 100+ years of duplicate work. What does this mean to some of the users of the Family Tree. I can summarize this as follows:
- Many entries of individuals with a number of descendants in the Family Tree are duplicated
- The duplicates mean that the particular information showing in your own lines may be inaccurate or incomplete
- The availability of green icons does not mean that the work has not already been done
- There is no present way for users to "fix" the entries completely
- If you use MyHeritage.com, you can tell that any given ancestor has the problem by looking at the number of multiple links to the Family Tree for any that individual.