Pages

Thursday, May 30, 2013

Uploading GEDCOM files -- The Good, The Bad and The Ugly

I don't want to appear to be a conspiracy theorist, but I do attribute a significant portion of the problems I face in the genealogy world to that acronymic program; GEDCOM. I have been dealing with the vagaries and limitations of the program, officially known as GEnealogical Data COMmunication, almost since its inception in 1984. What it boils down to, is that GEDCOM has been the only commonly accepted way to share genealogical data between the various insular database programs. But I am starting to believe that the program was specifically written to make my life even more difficult than it is already.

In its conception, the idea is laudable. Create an easy to use program that allows differently designed genealogical database programs to share data. But from the very beginning, few if any of the various commercial implementations of GEDCOM used its entire potential. The developers of each of the commercial programs felt compelled to add their own unique features. Features that were not adequately ingested by a GEDCOM export were lost entirely when imported into a competing developer's program.

This misfit of the data resulted in a steady loss of information. Of course, my program on my computer kept its integrity. But every time my records were shared, a little bit (or more) of the information in my file was lost. Not lost in the absolute sense, but certain pieces of information in my originating program were lost in transferring the data to another program. This alone was a fact that I could have lived with, but the real problems only showed up in GEDCOM's old age.

New features, unheard of and unplanned for, kept cropping up in new programs. Those new programs kept getting more and more integrated with online resources and finally, some of the programs broke entirely away from a single program on a my computer sort of scenario, and instead began to utilize online databases and incorporate those databases into my local program.

If the problems caused by this integration had stopped there, I might have maintained my neutrality towards GEDCOM, but now there is an entirely different problem: GEDCOM uploads onto online family trees. The users of online family tree programs and the developers of those same programs are unanimous in their support of using GEDCOM to upload a file to the Web. But due to the fact that it was so easy to export your data to a GEDCOM file and then almost immediately turn around and share another copy to be uploaded, caused a huge surge in the number of online duplicate family trees.

The argument that it is wasteful to require someone to re-enter their entire database, merely to share that research online is laudable. But that is not exactly what is happening. The vast majority of online family trees are merely copies of previously uploaded family trees. My experience is that this proliferation of family tree online makes finding those that contain original research difficult or nearly impossible.

Rather than sharing the benefit of additional research, the extensive field of copies severely reduced the chance that I could find anything of value in the family tree programs. I have frequently heard the argument (and repeated it in this blog) that the benefit of online family trees is chiefly in giving a serious researcher leads that they may otherwise have missed. In todays world, with some of us literally buried in online family trees that are primarily copies, that advantage, if there ever was one, has vanished into a huge pile of duplicates.

There is literally no way that I can deal with all of the variations and duplicates online. Today, for example, I spent considerable time trying to untangle a number of duplicate online entries for a patron at the Mesa FamilySearch Library. My attempts were largely unsuccessful because the scope of the duplicate entries defied resolution. The original "ease of sharing" mantra of GEDCOM supporters has turned into a sorcerer's apprentice nightmare of duplicate files. How many people believe they are "doing their genealogy" when all they have done is to copy a copy of a copy into an online family tree program. This is demonstrated by the very, very small number of online family trees that are at all supported by source citations, even when those citations are simple to obtain and include.

You might argue that it is too easy to blame GEDCOM for the problem. The real problem is much more complex. That may be true, but will rewriting GEDCOM or producing a substitute reverse the trend to create more and more copies of the same data? Maybe we should focus some serious attention on the issue of duplicates and the difficulty of stopping rampant copies, rather than facilitate more copies by developing even easier to use copying programs?

4 comments:

  1. The problem, James, isn't that GEDCOM makes it too easy to share data. That is good.

    The problem is that software has made it too easy to merge other people's data into one's own. That is bad.

    Genealogists should keep their own research separate from that of others. Programs should allow you to virtually assemble data from various sources for reporting only, and NOT allow merging of data sets together.

    Online datasets must keep individual data sets separate so that your research is your research and no one else's. And they should check GEDCOM uploads and disallow any that appear to have significant copies of other people's data in them, and force them to remove the other data before accepting the answer.

    No, GEDCOM is not and has never been the problem. Data sharing is good.

    Programs that allow you to merge other people's data in yours is the problem.

    Unfortunately, the problem has advanced too far that I don't know if the patient will recover.

    ReplyDelete
  2. I think that's a great idea, Louis. That's why I keep mine on my own website. But people still take and take and rarely (it's been years actually) ask for sources, or share anything back.

    The only way I can see using online trees is to contact people. If they can't answer me intelligently they're toast.

    ReplyDelete
  3. James, you said, "You might argue that it is too easy to blame GEDCOM for the problem. The real problem is much more complex. That may be true, but will rewriting GEDCOM or producing a substitute reverse the trend to create more and more copies of the same data? Maybe we should focus some serious attention on the issue of duplicates and the difficulty of stopping rampant copies, rather than facilitate more copies by developing even easier to use copying programs?"

    Nothing will stop treebies from copying whatever is on the web or whatever else they run across, no matter how silly (three brothers, Indian Princess, settled in 1535 in Massachusetts).

    It does not matter what programs are available.

    Tree sites that intend evidentiary foundation, however, should not allow mass-uploading genealogical files, with whatever program.

    I am now seeing tree sites given as "sources" in FamilySearch - Family Tree. This site began as a mess of non-evidentiary data and in another 15 or 20 years might be fairly well cleaned up. Or might be equally swampy, given that users are free to add whatever they wish using a variety of easy-click programs.

    It is the evidentiary intentions of the persons making the entries, not the *program* that is the glitch.

    ReplyDelete
  4. Thank you so much for this post cause now my sister knows I was not purposely leaving out data and causing her frustration with an uploaded TREE.

    DuSyl of
    DuSyl.blogspot.com

    ReplyDelete