RootsTech 2015

Some people eat, sleep and chew gum, I do genealogy and write...

Friday, August 2, 2013

Is Moving Towards a Solution for Establishing Genealogical Data Communication Standards Possible?

In a recent post, I outlined a very few of the challenges confronting genealogists as a result of the insular nature of the various commercial genealogical database programs both desktop and online. One original effort in this regard was the establishment of GEDCOM which was first released in Version 1.0 back in 1984. There has been a lot of technological water under the bridge since 1984 and all subsequently released Versions of GEDCOM were ended with the release of Version 5 in 1996. There were a few proposed changes made up until 2001 but there was never a version 6 released.

If I were to revert to my legal jargon, I would say that the cause of action is not ripe for adjudication. It appears to me that the focus of the programmers and technicians addressing the issue of file transfer parameters are jumping ahead of the underlying, unresolved issues that have yet to be discussed and decided by the genealogical community. In fact, such agreement may not even be possible given the disparity of opinions and dynamics of the community. The conditions that allowed for a de facto standard back in 1984, simply do not exist today. At the time, FamilySearch was in a dominant position to propose a standard to a newly and at the time, barely online genealogical community. In 1981, there was no Internet. It was only in 1981 that the Internet protocol suite (TCP/IP) was standardized, and consequently, the concept of a world-wide network of interconnected TCP/IP networks, called the Internet, was was conceptualized. There were no major international commercial databases. The idea of digitizing all of the world's records was not even a possibility. Additionally, the concept of sharing data between different types of computer programs was in its infancy. The idea of commercializing the Internet was not made possible until 1995, when the NSFNET was decommissioned. To give a dramatic illustration of the changed conditions; in 1996 Infobase's parent company, Western Standard Publishing, purchased Ancestry, Inc. In other words, the company we now know as with its huge database of genealogical sources, did not even exist at the time the GEDCOM standard had its last update.

Now, all of this would seem to be very obvious. Today, we have a huge number of players in the commercial genealogical community, some of whom are multi-national and have a huge stake in the outcome of any imposed standard. Although a few of the now popular genealogical database programs have their origins back during the time of the introduction of GEDCOM, Personal Ancestral File was (and still is) one of the major programs, perhaps because it was released in 1983. Going back to 1984, when GEDCOM was introduced, we find very few of today's players. For example, the present company of was established in 1986, so at the time the program was developed, GEDCOM was already a fact of life. For an idea of when these early programs were developed see Early Genealogy Programs by Dick Eastman. If you read his article, you will see that there were very few genealogy database programs in 1984 and many of those that did exist at the time have long since disappeared. You might also remember that Apple's Macintosh computer was introduced in 1984 with its then revolutionary desktop metaphor (windows) interface.

Is it any wonder that GEDCOM does not fit well in today's online, high speed, complex genealogical computer market?

It is interesting to speculate about the influence GEDCOM had on the development of today's genealogical database programs. How different would they appear if GEDCOM had not been introduced so early? As it is and was, very few, if any, of the existing programs implemented all of the features allowed by the GEDCOM tags and many programs (all?) choose to implement only a small set of the GEDCOM standards. So, almost from the very beginning of genealogy software, there has never been a one-to-one correspondence between the various record fields of the different genealogy programs. In fact, the developers of new programs tried to differentiate themselves from existing programs and add features then unavailable. This type of marketing effort does not engender compatibility. There is absolutely no incentive in the software industry to make your programs work well with others.

In an analogy to the automobile industry; how many of the parts on your car are shared with all of the other cars in the industry? See what I mean. There are dozens of types of windshield wipers! The software industry is equally fragmented. It is only in certain very limited areas, such as connectors and storage devices, that there is even a small amount of uniformity.

Doesn't establishing a present day, universal standard for data communications face an uphill battle with the natural tendencies of the marketplace? What incentive to all of the players in the community have to join together an make such a standard? Why is it in's interest to easily share all of the data fields in Family Tree Maker with's Family Tree Builder? Doesn't that whole concept ignore the marketing factor of unique features? Does this concept explain what is needed to move towards a data communication standard? Perhaps not, but it certainly explains why such a standard may be next to impossible to establish.

If I wanted to design a genealogy software database today, why would I seek to provide a standard way to exchange the data from my program with all of the other programs on the market? Who would buy my program for that reason alone? Right now, there is a scramble among some of the software programs to take advantage of a compatibility with Personal Ancestral File. You will see claims that this or that programs works with PAF files. But even this marketing approach is far from universal. Most of the existing programs make no mention of PAF compatibility, even if they can import GEDCOM files.

In order to move towards some kind of universal data standard, we would have to return to the pre-database days of 1981 and before. Even if a core of programs were to adopt a standard, there will always be developers who want to differentiate their products and try and create a new standard. Think Apple and IBM, then Apple and Microsoft, then Apple, Microsoft and Google.


  1. "Why is it in's interest to easily share all of the data fields in Family Tree Maker with's Family Tree Builder? Doesn't that whole concept ignore the marketing factor of unique features?"

    Surely the obvious answer to that is - you might be making wonderful features available in your new software but if I can't move my data over, I'm not going to buy your software.

    It seems to me that the answer is slightly more subtle - there's just enough compatibility in GEDCOM to allow people to move the bulk of their data, so they think they've got a good trade-off between compatibility and lock-in.

    So that then leaves us with the question - why have people spent money making their software FS Family Tree compatible?

    1. My experience is that file portability is only a very secondary concern with both the users and the developers, until it becomes a personal issue.

  2. I beg to disagree. Think of the credit card. There are many different banks and retailers offering cards all based on a single standard that allows the same functionality. What differentiates the products are the services provided. Standards can be defined for the functions to be performed (build a tree, enter a person, upload a picture) and for defining and exchanging the data required to perform these functions. Once documented, developers can announce they comply with the standard.

    The drive to standardization needs to come from the users, not the developers. If we insist on products and services that conform to the standards, developers will follow.

    A simple thing like not letting dates be entered in multiple formats would be start!

    1. I agree that the drive for standardization needs to come from the users, but did you buy your last genealogical database program based on the ability to transfer data to some other unknown program?

    2. I don't agree that credit cards are a good example of standardization. Most of the restrictions on credit data come from government regulations, not a desire by credit card companies to share data with one another.

  3. "Why is it in's interest to easily share all of the data fields in Family Tree Maker with's Family Tree Builder? Doesn't that whole concept ignore the marketing factor of unique features?"

    You are confusing features with user data. Ancestry and MyHeritage don't make money by preventing users from exporting their trees. They make money by selling features which augment user data to create a richer research experience, such as record matching and collaboration. They have no reason to make it difficult t move tree data around. In fact, it's in everyone's best interest for it to be trivial to move tree data around because it further enriches the research experience.

    1. The commercial database programs, whether online or desktop, have no reason to expend their resources making it easier to move data from one program to another. To the contrary, having a program that makes it difficult to move data convinces most users not to leave the program. Why should any developer spend resources enabling users to transport their data to a competing program?

    2. The commercial database programs may not make money by preventing users from exporting their family tree data, but they lose customers if users can migrate their data easily from program to program.

    3. My formal response:

  4. I am out on a limb in this argument. You see, I attach much greater significance in my own research to rich-text narrative, places, and events. So much so that existing formats would not be able to help me. STEMMA was originally a private research project to provide me with an appropriate model and supporting software. However, it has become evident through that research that a single model can be conceived to support all our different forms of micro-history, including genealogy, family history, one-name studies, one-place studies, and personal historians (as in APH).

    It would be nice to think that a new standard could look at this bigger picture but I suspect there would be too much resistance, and too many folks wanting to simply fix GEDCOM. I personally believe these routes are not mutually exclusive. There are many arguments in favour of fixing GEDCOM, and making its specification clearer. However, there will always be an argument for addressing the bigger scope of researchers who don't simply want a "family tree" or a "pedigree chart".