Some people eat, sleep and chew gum, I do genealogy and write...

Monday, February 21, 2011

Comments on Monday Mailbox: Bulk Merge

As usual, the Ancestry Insider (AI) hits another home run (or sinks a three-pointer whichever) with his short post on the New.FamilySearch.org (NFS) data issues, called Monday Mailbox: Bulk Merge. If you read the post, be sure to read the comments. But as is usual with me, I cannot let this opportunity go by without also commenting on the subject of the post.

First, I feel I need to clarify the statement, "FamilySearch seeded the tree with bad data, some from computer merging, some from human error." As I understand what happened, the bad data AI refers to is the conglomeration of the Ancestral File, the International Genealogical Index (IGI), the Pedigree Resource File (PRF), the general membership records of The Church of Jesus Christ of Latter-day Saints and the Church's Temple records. As a result, right from the start, NFS had an insurmountable problem, inconsistencies between the different copies of the input data and multiple copies of the same individual and family records. For example, as it exists today, the Ancestral File contains a copy of a record of my Great-grandfather and the IGI contains more than 30 copies of the same information (with substantial inaccurate variations), and who knows how many duplicate copies in the PRF. This is in addition to the Church and Temple records. So Henry Martin Tanner, my Great-grandfather, has 115 combined records in NFS and probably quite a few more uncombined records. This is commonly known as the "data challenge" of NFS. This is also what AI is talking about when he says that FamilySearch "opted to keep the bad data..." I understand him to mean that FamilySearch has decided not to purge the NFS data of multiple copies with the unreliable entries but build a method by which users (you, me and etc.) can "clean up the data."

I personally would clean up the data by throwing away (erasing, deleting, isolating) the inaccurate data and leaving only the "one true data" about any individual and family. Guess what? There is the remote (though distinct) possibility that some of my extended family members may disagree with my selection of the one true data. Then what? Hmm. Does anyone out there recognize this issue from working with a wiki? The problem faced by NFS is exactly the reason that a static online genealogy database will never be satisfyingly accurate. It is also the reason that wikis exist.

Can FamilySearch turn NFS into a wiki? Not even remotely possible. Remember what I said above, that the data added to NFS contained "membership information." This information could never be subject to user change, any more than the program will now allow the combination of this information in the present system. (If you were not aware, NFS allows users to combine duplicate individuals, except when the duplicate involves two or more duplicate membership records). Then the correction has to be made through the Church organization outside of the NFS program.

So what is meant by AI's statement that the replacement system will allow users to "clean up the data?" That is the Question (with a capital Q). How will the new (we keep using the word "new" over and over until it doesn't mean what you think it means i.e. Princess Bride) program handle new (here we go again) information that is really bad? For example, what if one of my relatives wants to show my Grandfather with his second wife as his mother?  (Who would do such a thing? Just take a look at my lines in NFS, that is exactly what someone has done). How will the program take into account lunacy?

How will the program prevent many more of my relatives from doing similar things in the future? Is the cost of liberty (from bad data) going to be eternal vigilance? Will I have to go back to the program every week and clean up the mess? Yes, as AI says "Once again we see evidence that genealogy is deceptively difficult."

2 comments:

  1. I agree. As long as the general public, meaning people without good genealogical training, can just change information there is always going to be mistakes. I swear everyone thinks the compiled family group sheets, etc, that came from their grandmother were 100% correct. And that anything different is wrong, even if there is good amount of proof otherwise. Others just don't want to take the time to really seek out and verify information!

    And whats up with NFS allowing folks to upload GEDCOMs? Why allow more bad information to be introduced into the system? Its simple enough to go in and add a fact in NFS, and you can add the source for that fact at the same time. In the case of most of my ancestors there are already plenty of correct facts just mixed in with some bad ones.

    As for NFS search reminding some of a wiki, I agree. Only difference you can remove bad and/or unreferenced information from a wiki. And it is easy to source, and see the sources of information within a wiki article.

    ReplyDelete
  2. James, thank you for this perspective. I think one underling crux is this: if the revised system opened to public view/use resists changes based on evidence, then reasonably effective researchers will not invest time and energy into making such changes.

    It does not matter what reasons there may be for resistance to changes. Other tree systems prohibit changes not made by an 'owner' (or designated co-editor). Some of my ancestors (for whom prevailing 'tree' settings are accurate) have had Ordinances performed. I am not certain about Ordinance status of others, but do know that there are wildly incorrect genealogical settings for some of them as well as the duplicates issue that you have touched on here and elsewhere.

    If evidence-based corrections are deterred and those who could make them prevented from doing so, the tree would be left to those who either do not know or do not care what is required for a reasonably fact-based approach.

    Then such a tree setting will remain one that collects genealogical errors and multiplies them.

    This surely was not a purpose contemplated when the present setting was so awkwardly compiled from error-riddled sources. It may continue to be a sad story of wasted effort.

    ReplyDelete