As usual, the Ancestry Insider (AI) hits another home run (or sinks a three-pointer whichever) with his short post on the New.FamilySearch.org (NFS) data issues, called Monday Mailbox: Bulk Merge. If you read the post, be sure to read the comments. But as is usual with me, I cannot let this opportunity go by without also commenting on the subject of the post.
First, I feel I need to clarify the statement, "FamilySearch seeded the tree with bad data, some from computer merging, some from human error." As I understand what happened, the bad data AI refers to is the conglomeration of the Ancestral File, the International Genealogical Index (IGI), the Pedigree Resource File (PRF), the general membership records of The Church of Jesus Christ of Latter-day Saints and the Church's Temple records. As a result, right from the start, NFS had an insurmountable problem, inconsistencies between the different copies of the input data and multiple copies of the same individual and family records. For example, as it exists today, the Ancestral File contains a copy of a record of my Great-grandfather and the IGI contains more than 30 copies of the same information (with substantial inaccurate variations), and who knows how many duplicate copies in the PRF. This is in addition to the Church and Temple records. So Henry Martin Tanner, my Great-grandfather, has 115 combined records in NFS and probably quite a few more uncombined records. This is commonly known as the "data challenge" of NFS. This is also what AI is talking about when he says that FamilySearch "opted to keep the bad data..." I understand him to mean that FamilySearch has decided not to purge the NFS data of multiple copies with the unreliable entries but build a method by which users (you, me and etc.) can "clean up the data."
I personally would clean up the data by throwing away (erasing, deleting, isolating) the inaccurate data and leaving only the "one true data" about any individual and family. Guess what? There is the remote (though distinct) possibility that some of my extended family members may disagree with my selection of the one true data. Then what? Hmm. Does anyone out there recognize this issue from working with a wiki? The problem faced by NFS is exactly the reason that a static online genealogy database will never be satisfyingly accurate. It is also the reason that wikis exist.
Can FamilySearch turn NFS into a wiki? Not even remotely possible. Remember what I said above, that the data added to NFS contained "membership information." This information could never be subject to user change, any more than the program will now allow the combination of this information in the present system. (If you were not aware, NFS allows users to combine duplicate individuals, except when the duplicate involves two or more duplicate membership records). Then the correction has to be made through the Church organization outside of the NFS program.
So what is meant by AI's statement that the replacement system will allow users to "clean up the data?" That is the Question (with a capital Q). How will the new (we keep using the word "new" over and over until it doesn't mean what you think it means i.e. Princess Bride) program handle new (here we go again) information that is really bad? For example, what if one of my relatives wants to show my Grandfather with his second wife as his mother? (Who would do such a thing? Just take a look at my lines in NFS, that is exactly what someone has done). How will the program take into account lunacy?
How will the program prevent many more of my relatives from doing similar things in the future? Is the cost of liberty (from bad data) going to be eternal vigilance? Will I have to go back to the program every week and clean up the mess? Yes, as AI says "Once again we see evidence that genealogy is deceptively difficult."