Some people eat, sleep and chew gum, I do genealogy and write...

Friday, June 10, 2011

No man is an island, but genealogy programs are

One of the uncomfortable truths about current lineage-linked commercial genealogy programs is the unavailability of a common method to transfer data between programs. Even if you purchase two programs from the same developer, such as the Mac version of Family Tree Maker and the PC version of the same program, they still don't talk to each other completely. There are a few bridges between programs that do a credible job of accepting the data, but I have yet to find a program that did not make hash out of some part of the data or another. Here is an example. Suppose I have been doing genealogy for some time on my trusty old PC and using the venerable Personal Ancestral File (PAF). You would think with its millions of users, that every program by now would read and completely import a file from PAF, wouldn't you?

Guess what? Even different versions of PAF when transferred by GEDCOM will create a List file of stuff left over from the transfer. The people at the Build a BetterGEDCOM Project are on the quest to create a more complete and modernized version of GEDCOM, but they haven't reached a consensus yet. If you would like to see what is wrong with the current version of GEDCOM look at the compilation "What's Wrong With GEDCOM?"

The problem is fairly simple and straightforward, the solution is complex and obscure. The problem is that each software developer has no interest whatsoever in making their program "compatible" with any other program. To do so would be first, admitting that your program was, in fact, no better than the other program and second, opening up an easy way for your competitor to steal customers. So every software developer in the world has to think of some unique way to handle the data and then tweak it in a way that makes it impossible or difficult to copy. If I am developing a program and I figure out a way to animate all the data and make it dance in a circle, do you think I want everyone in the genealogy software world to immediately be able to take advantage of my clear superiority and import all of my fancy work into their program? Not on your life!

Back to my example of PAF. Here you have a rather simple program by today's standard and one that has been openly abandoned by the owner FamilySearch, who has stated that the program will not be updated. Yet, you still have arguably millions of users of the program all over the world. You would think that everyone developing software for genealogy would come up with a way to import a file from PAF without losing one iota of data. In actuality, only a very few, a mere handful, of programs will recognize and import a PAF file. I am not talking about a GEDCOM file, I am talking about a .paf file. But we still have a problem here, most PAF users kept their source information, if they kept it at all, in their notes. Sure, we can open a PAF file, but all of the sources are still in the notes. Wasn't it terrible that all those people who used PAF didn't know any better?

So some very inventive programmer figures out a way to translate PAF note sources and convert them into usable separated field sources. Then these translated sources would all be locked up in the new program and unavailable to any other program. Before I get comments about how great an individual program will import PAF files, like Ancestral Quest, for example, that only moves the issue down the table to the next program, now my data is locked up in Ancestral Quest.

I think about this problem frequently because I teach a variety of computer programs on a rotating basis. I am constantly asked which of the programs is "best." I usually say something like, "All of them are really good and choosing a program is a matter of personal preference." What that really translates into is that I am still looking for the perfect program. But even if I found the perfect program, I would then be locked into its data structure because all of the other imperfect programs wouldn't read its files or transfer them without data loss.

I am coming close to a resolution of this problem and the answer may not be what you would think. First, I keep my data in a variety of programs. Some have functions I like better than others and I kind of spread the data around to make sure I can go any direction I want at any time. Second, I am leaning more and more towards putting my data online in a Wiki. Not just any Wiki, I have to look around for a while and decide the best route to take with a Wiki. Why a Wiki? Two words. Open source.


  1. Wiki? Interesting. Is this where entries are of well defined object types? By 'well defined' I am referring to well defined to the system. Though typical wikis have standard implementations (e.g. standard headings and data for an entry on a person). The question is 'does the wiki software have the knowledge that the entry is about a person?' Wouldn't it be great if wikipedia, for example, had 'well defined' object types. It could then support business logic to answer relationship queries... relationships not only of people, but of places, events, sources, etc... making a truly superior knowledge-base.