Some people eat, sleep and chew gum, I do genealogy and write...

Monday, November 1, 2010

Let's talk about GEDCOM -- what a pleasant subject!

[Warning this post is likely to be full of technical jargon and acronyms]

There is a scene in the Harry Potter movies where Harry rides the Knight Bus through London. The Bus magically changes shapes to drive between two other buses and avoid obstacles so it can continue driving fantastically fast. GEDCOM is NOT like the Knight Bus. I hesitate to bring up this delicate subject, but since The Ancestry Insider (AI) has kindly opened the door for me, I thought I might as well weigh in.

At the now, not so recent Bloggers' Day at FamilySearch, I distinctly remember statements made about the current status of GEDCOM. At the time, I hesitated to get into the fray not wishing to bog down a really interesting presentation. However, rather than make up some vague memory, here is a quote from AI:
There was no small dissatisfaction among attendees regarding GEDCOM’s deficiencies. It has not been updated since way back when FamilySearch barely gave source citations any attention. It does not support the best of breed citations supported by FamilySearch’s competitors. Instead, it relies on a single text field, cremating citations that are forced through it.
GEDCOM also does not support transfer of artifacts, images, and attached documents, all of which are misrouted to the great lost-luggage warehouse in the cloud.
When the only program any of us dealt with was an old version of Personal Ancestral File (PAF), this issue was non-existent. Since your data was in PAF and my data was also in PAF we all got along famously. I could give you a file and you got what I had, nothing more and nothing less. But, guess what, the world continued to change. PAF got to version and ossified back in about 2002 (in software years, sometime right after the Dark Ages). Here is living proof:

 At this point, here comes the disclaimer. I am not a software engineer. In fact, I am not an engineer of any kind. But, I am conversant with the issues and can keep up with the jargon. What I do know is that you cannot put a round peg in a square hole without modification. For years, I have been transferring files for people using GEDCOM. Frequently, in the process, the transfer creates a LST file. This is the stuff that didn't get properly imported into the target program. As a practical matter, we have mostly ignored those LST files in the past because it was extremely difficult to explain to the owner of file what had happened in a way as to not cause untold grief for them and me.

But now we fast forward to 2010. I just imported my friend's PAF file into a current software program and guess what? All of his sources (assuming he had any) are in his Notes. Obviously, this is not a GEDCOM problem, it is a PAF problem. But what would have happened if he (or she) had actually put some sources into PAF?  The results would not have been the opening chapter in Evidence Explained.

I will have to talk about the past and I may go back and review a lot of history, but probably in another post. Right now, I will try to state the issue as I understand it, in terms that can be used for a further discussion. First, this is not just a FamilySearch problem. It is a data transfer program that grows more complicated each day as genealogy programs are updated and new programs introduced. It is important to remember that GEDCOM 1.0 was released in 1984, the year the Apple Macintosh Computer was introduced. The difference is that GEDCOM stopped in 1996 at version 5.5 Standard, while the Macintosh is currently at dual quad core status. It is well known (by those who care) that the GEDCOM standard allows for far more tags than were implemented in PAF Version 5.2 and that many software programs have implemented non-GEDCOM extensions. So, the world as it is, means that most genealogy software programs are even now, only barely compatible even using what ever parts of GEDCOM will work.

I am reminded of the issues that accompanied entering data into PAF. Teaching a class on PAF involves a lot of instruction on how to enter data so that it would "conform" to the "Data Entry Standards for PAF Files." The fact that such data entry standards existed at all is a an admission that the GEDCOM standard was extremely limited. My perception may be off base, but I see that PAF and GEDCOM are inexorably connected. As long as FamilySearch supports PAF, it cannot continue to develop any further extensions to GEDCOM. A new GEDCOM would cut off all the old users of PAF and be an admission that PAF was truly dead instead of just mostly dead.

This is my round one, more later.


  1. Wouldn't it be wonderful if GEDCOM 5.5 remained, but it's code and design were released to the public domain/open source community under a name like "GEDCOM96."

  2. My disclaimer: I am a software engineer. However, I have done very little (okay, no) development on genealogical software since the mid-90s.

    The last time I looked at a GEDCOM specification it did support structures for pictures, documents (and other binary objects). It also supported custom tags for those unique genealogical structures that companies would dream up/design. However, as you are probably alluding to, it was us software developers and designers that failed to properly implement GEDCOM.

    Having said that, it is too bad the standard has faded. Gee, what would happen if HTML 5.0 was the last one? Yeah, lets stop at this one.