Pages

Friday, February 27, 2015

GEDCOM or not to GEDCOM, that is the question

My apologies to Shakespeare, but there is a real issue over the now ancient GEDCOM standard. I liken it to those undersized spare tires that come with many modern cars; useful in an emergency but lethal if used too long or too much. It is sort of in the category of the venerable Personal Ancestral File program. It still has adherents and almost fanatical defenders. For me, of course, this journey down memory lane reminds me of the "good ole' times" when we were embroiled in the issues of genealogical data standards. It looks like to me that the partnerships being forged between the larger genealogy companies and the concomitant agreements concerning the APIs back and forth, have predictably obviated the need for a separate genealogical standard. This is especially true due to the background of discussion about the ability to move data from one online tree program to another.

At a very practical level, I consider the use of GEDCOM files to transfer the data from one large family tree to another, to be the point at which the family history industry moves decisively away from source-based reality into the never-never land of imaginary pedigrees. Uploading a huge unsourced family file into the FamilySearch.org Family Tree, for example, would be a disaster for the descendants of all those whose ancestors have now been duplicated. Notwithstanding my fear of this eventuality, I still hear a constant background of noise about the need to upload an entire file and using the GEDCOM file format is presently the only way this is possible.

But using the GEDCOM format is like taking photos through a screen door. You get some of the details and lose others. Individual programs have addressed the need to move the entire data set from one computer to another, but the idea of moving an entire file from one program to another has languished. So here go the pros and cons of GEDCOM as I see them today (February, 2015).

Before I get to the list, I have a comment about large genealogical data files. I have seen files that contain well over 100,000 individuals and some that have grown much larger than that. I am certain that people with such huge files have either spent their entire lives adding people one by one or have copied huge amounts of data from other files. Do you realize that if you had 100,000 people in your file, it would take over 1600 hours just to look at each person for a maximum of one minute? Enough said on that topic for this post.

Pros

  • GEDCOM is presently the only practical way to move a large genealogy database from one program to another. There are limited methods of transferring and synchronizing data between two programs, especially when those two programs are owned by the same company such as an online family tree and the supporting desktop program, but there is no other way to move an entire file from two unrelated programs.
  • For basic data fields, GEDCOM does an very good job of preserving the existing file structure.
  • It is relatively easy to understand and export a GEDCOM file and then import the file into another supporting program. 
  • GEDCOM exports and imports are still supported by the majority of genealogical database programs on all computer platforms and operating systems.
  • GEDCOM has been a way to maintain reasonable data correspondence between different program. 

Cons

  • Depending on the program, a considerable amount of the existing file data may be lost in the transfer process since there are fields and types of media that GEDCOM could, but does not usually support depending on the program. For example, source documentation in Personal Ancestral File does not transfer well into almost all other programs. 
  • Using GEDCOM facilitates the transfer of large, unsupported, unsourced and inaccurate data files. Much of the proliferation of inadequately sourced, online family trees is a result of the use of exporting and importing GEDCOM files.
  • The need to support the GEDCOM standard has imposed arbitrary limits on the way genealogical information is stored and disseminated. 
  • Adding GEDCOM files to an existing family tree may create a large number of duplicates. For this reason, FamilySearch (the organization that originated GEDCOM) now requires uploads to be examined one person at a time and current implementations of the process in the FamilySearch.org Family Tree does not support notes, sources or multimedia. 

These lists are not exhaustive, I intended them to merely indicate the nature of the problems. I am certain that as time passes, there will be ways to exchange data between two online family trees in unrelated websites, either directly or through the mediation of a third party program.

4 comments:

  1. Thank you for posting this. I learned a lot. I also listed it in my NoteWorthy Reads post for this week (see http://jahcmft.blogspot.com/2015/02/noteworthy-reads-4.html).

    ReplyDelete
  2. Re your cons:
    1. "a considerable amount of the existing file data may be lost" - this is true, but that is an indictment of poor programming, not the GEDCOM standard, and would apply to modern APIs, etc.

    2. "Using GEDCOM facilitates the transfer of large, unsupported, unsourced and inaccurate data files." Again, this is not an issue with the standard, this is how the standard is used.

    3. "The need to support the GEDCOM standard has imposed arbitrary limits" - any standard imposes limits, that's what standards do. More serious is that the GEDCOM standard was fossilized in the 1990s and these limits mean it doesn't do what lots of people want (e.g. it doesn't directly support the ability to record changing names of places). But again, any standard could get equally frozen - in this case, FamilySearch, driven by the LDS Church, deliberately let the GEDCOM standards wither, while retaining copyright to stop anyone else taking it forward past the 1990s limits.

    4. "Adding GEDCOM files to an existing family tree may create a large number of duplicates." I have a great deal of sympathy with this view. However, this is an argument against unthinking bulk uploading, it's not an argument against a data interchange standard as such.


    ReplyDelete
    Replies
    1. The difference between bad programming the way GEDCOM is used is lost on the average genealogy database users. My intention was to point out the results of the GEDCOM standard, not particularly comment on the issue of some sort of standard. I would have to disagree with your characterization of what was done or not done by the LDS Church. The shift away from the GEDCOM standard was more a natural response to changing conditions rather than some kind of deliberate conspiracy.

      Delete