Pages

Wednesday, November 17, 2010

When is a standard not a standard? When its GEDCOM?

Back on 22 January 2010, Tamura Jones published an extensive evaluation of file transfers between different genealogical database programs. The post, entitled "Torture Test Results" documents the results of testing the import and export function using three "extreme GEDCOM files." This test would have been virtually impossible for me, since I do not own all of the programs tested, so I much appreciate the data from the test. The most significant finding is summarized in the conclusions to the very long post, "Of all the applications tested, not one passes all the tests. Many fail all the tests." Before you consider changing genealogy programs, you may wish to study the results of Jones' Test.

Absent from the test were any of the Apple Macintosh applications. However, in studying the test results, it is likely that the Mac compatible applications would not have done any better.

It is significant in the Tamura Jones' Test that even Personal Ancestral File (PAF) failed to handle long names. This is significant because PAF has been wedded to GEDCOM almost from the beginning. Given the limited size of data storage devices in the early years, many users relied on GEDCOM's small text files to transfer large amounts of data. Significantly, I am still teaching a class at the Mesa Regional Family History Center each week on file transfers using GEDCOM.

However, it is important to put the Torture Test in perspective, few users would run up against the extensive file types used in the comparison. But one thing the Test does show is that, in part, the limitations experienced with GEDCOM transfers are not inherent in the standard, but choices made by the programmers and developers implementing the standard. In other words, the GEDCOM standard, in one of the test segments, would allow a very long name, but none of the programs functioned completely using a name of the length tested. Was that failure a result of the GEDCOM standard? The answer goes to a more fundamental issue, that of adhering to a standard and implementing the standard in a way that reflects the real world of genealogical data transfer.

There is a more fundamental issue, why would a programmer spend time and resources making sure his program was "compatible" with competing programs? If I want to build a railway car, I must make sure that it will run on the country's rail system. Likewise, if I want to build a website, I had better make sure that the majority of the world's browsers will display my site correctly. I can do both only by adhering to a standard. But in the case of GEDCOM, where is the incentive to make sure my program will share data with twenty or thirty other competing programs? Don't I want to make my own program so popular that my program becomes the "standard" that everyone else has to copy?

This is what happened, in effect, with PAF. The program was so pervasive that anyone wanting to lure away a prospective PAF user had to make sure that all of the potential customer's data would come over to the new program. Therefore, the incentive to be GEDCOM compatible. But realistically, why would Ancestry.com spend money and time making sure that Legacy Family Tree files would import correctly? Or any other program for that matter? Isn't this the heart of the compatibility issue? There are plenty of products out there that do not adhere to any kind of standard. For example, have you tried to buy a new set of tires for your car or truck lately? In the computer world, some items have developed standards by default. That is, the way of solving a particular problem became a standard because it worked and many manufacturers adopted the solution. For example, power cords. In the beginning every device had a different power cord. Then the standard power cord became a de facto standard by reason of its use. But newer computers are experimenting with a whole new way of attaching power cables and the "standard power cable" may become another abandoned standard.

Let's suppose that the Build a Better GEDCOM folks are very successful in developing a whopping good standard. Who will use it? Why will they adopt it? What is more likely is that the new GEDCOM standard will become a way to translate all of the different data sets in the individual programs in a consistent fashion. In other words, just like translating from English to Spanish or whatever, you don't try to convert all the English speakers to use terms that can easily be translated into Spanish, you take what is there and come up with a way to transfer all of the information needed to be transferred from English to Spanish. Then you take advantage of having the translation method to create a new class of translation programs that match up the data between the various programs. Programs, like languages, do not then have to be changed, but if the programmers do something really bizarre, they will end up isolating their programs from the mainstream of genealogy.

Closing note, I have no illusions that PAF is a viable program. It is dead and only alive through the conservative nature of Microsoft programming as witnessed by the fact that the Mac version stopped working on Macs years ago. As soon as Microsoft gets with the program and changes their operating system to something that is UNIX compatible, these older programs will vanish like rain on a hot Mesa street in the summer.

No comments:

Post a Comment