Genealogy's Star: Who benefits from genealogical data standards?

Sunday, August 4, 2013

Who benefits from genealogical data standards?

In an insightful post on Justin's Genealogy Blog, entitled, "Everyone Benefits from Data Portability," Justin makes some interesting comments concerning standardization of the interchange of genealogical data. As I understand it, his argument is that everyone, including the developers of genealogical software, would benefit from a common information exchange standard. But he also admits the following:

I have never paid for a genealogy product precisely because the world I just described doesn't exist. The pain of keeping multiple trees in sync is greater than the benefit of features which products offer (at least for me). If it were trivial to keep all desktop products and online trees in sync, I would start buying.

So part of his argument, at least, is that some genealogists do not make purchases due to data incompatibility. I believe that this is the first time I have heard that particular view expressed.

I think that perhaps I have not been clear enough in my earlier posts as to exactly what I am talking about. To explain, I will resort to my usual use of hypothetical situations. But before getting into hypotheticals, let's review a little history and go back before the beginning of genealogical software development. In the early days of personal computers, there were dozens of manufacturers of different computer systems using a variety of computer processors. I remember both the Altair 8800b and the IMSAI 8080 computers, although I only owned an IMSAI (but didn't ever use it). The first personal computer I ever spent any time working on was the TRS 80 from Tandy Corporation (Radio Shack). At that time, the three big names in computers were Apple, Tandy and Commodore. In 1979, the TRS-80 had the largest available selection of software in the microcomputer market.

Now, in 1979 or even into the 1980s when I started using Apple II computers, there was not even a concept of data exchange. Any existing genealogy programs were rudimentary, text-based and not very useful. I was talking with a friend yesterday who related how her son wrote her a genealogy program back then, which of course, only had capital letters and couldn't be printed. Just as there were a variety of computer platforms, such as Atari, Texas Instruments (TI) and in 1981, the IBM PC, there were different operating systems and different programming languages. There was no way to connect two different computers together and no one was really concerned about doing that anyway.

Along came some genealogy programs and it was almost a miracle just to have your genealogical data in a computer file where you could search for duplicate names and find the information you had entered. Of course, I could have shared a file with someone else, had I known anyone who was interested and had the same computer brand and software program that I did. The point of this review is to show that there were no computer standards from the very beginning of the personal computer revolution. Computer programs were written for a specific computer with a specific operating system.

Was data file exchange an issue back then? Yes, it certainly was. Was it in the interests of the developers and manufacturers to make their data files compatible? Can you imagine Apple and IBM getting together to formulate a data standard?

Fast forward to the present. Exactly the same situation exists today. We have dozens of different computer manufacturers and still have incompatible different operating systems. When was the last time you tried to open a data file or document and found that the file type could not be opened because you did not have that particular program on your computer? In a perfect world, with no economic competition, maybe someone could dictate absolute file compatibility. But even then, with the changes in technology and the development of new processors, data incompatibility is inevitable.

Can programs be written to "translate" the data from one program to another? Yes, sometimes and with the cooperation of the various manufacturers. I can presently run Windows programs on my OS operating system Macintosh computer with a program. But even that level of integration does not make the data files compatible.

Now, a hypothetical. Suppose I am a developer of genealogical software. If I am going to spend my money and my time writing a program, I might like to make a profit. Do I start out with the idea of making my program as compatible as possible with every other program on the market? Not if I can help it. I make my program as unique as possible so that I can differentiate my program from all of the others already being sold. Ultimately, I would hope that my program became so popular that it becomes the de facto standard for programs for genealogy.

But, you say, you are confusing operating systems, file formats and data. Yes, I am. At each level there is a challenge in exchanging information. Yes, people do write translator programs to move information from one program to another. For example, there are dozens of different file formats for images, such as .jpeg, .tiff, .png, .CR2 etc., Most imaging programs can read some of the more common file formats and you can see the photo or image, but there is no "standard." I use Camera Raw files from a Canon Camera and store them as .dng files, that is Adobe Digital Negatives. Very few of the popular programs can read my files. Is this a problem? Yes. Do I worry about the format and file type? Yes. Is there any movement to make a single image file standard? Likely, but probably it will not be effective.

Given the history of computers and given the history of computer programming, is it likely that all of the existing genealogical program developers will suddenly decide that everyone will benefit from a common standard? Not at all likely.

The next question is, would everyone benefit from a common data exchange standard assuming it was possible to design one and it became universally adopted? Maybe and maybe not. Do we really want to stop program development and freeze it at some arbitrary level. Oh, but you say, standards can be revised and updated. If that happens the standard is following the market, not imposing the standard on the market.

Will data become easier to move from one program to another? Yes, certainly. GEDCOM with all its present limitations is a good example of a way to move information between programs without impinging on their own file structure. So, a standard in genealogy has to be semi-independent of the programs. It needs to be useful and relatively easy to use and apply, but it also has to be independent.

More on this, I am sure.

9 comments:

Adrian BruceAugust 4, 2013 at 9:20 AM
"Do we really want to stop program development and freeze it at some arbitrary level?"

But that wouldn't happen. "Software" contains (at least) two components - data and algorithms. Making the data exchangeable allows plenty of scope for the algorithms to be wholly different. The latter part gives opportunity for a Unique Selling Point. Further, most of us would be quite satisfied with interchanging a sub-set of the data, i.e. the genealogically relevant part. Other stuff, like research plans, could stay in one place. (OK - I'm skating over a lot of stuff there about how the research stuff points to the research subjects, i.e. the genealogy, but it's do-able.)

To take one example - I uploaded my tree as a GEDCOM into Ancestry once. The Ancestry hints have been surprisingly helpful (being flippant about it, the score is about Ancestry 20, I-can-do-it-all-myself 0). If I could transfer a GEDCOM without loss from my desktop software into FTM and vice versa, then I'd buy a copy of FTM solely to sync my tree with Ancestry. As it is, I have no intention of buying FTM and only update major people on my Ancestry tree.

The flaw of course in all this is what you alluded to earlier - the demand from most people is nothing like this sophisticated.
ReplyDelete
Replies
Celia LewisAugust 5, 2013 at 9:31 AM
So clear, James!
ReplyDelete
Replies
Tony ProctorAugust 6, 2013 at 2:49 AM
The biggest question here James, IMHO, is that of "scope". For people with a strong software background, like myself, it goes without saying that a representation can be conceived that would make our data transportable between different products, on different machines, and in different locales. This has happened in many industry sectors previously.

So, other than misguided fears that a vendor may be losing market share by opening up their data, what arguments are there against a comprehensive new standard?

Well, the extra effort involved could be one argument. GEDCOM is relatively simple, although its model is also rather simplistic. Where would we draw the line, though, with enhanced representations. There are many useful features that our software products could implement, but without a more comprehensive standard the associated data could not be exchanged with other products. Should we expect the scope of our products to remain fixed in a GEDCOM-inspired world for another decade?
ReplyDelete
Replies

Add comment

Subscribe To

Pages

Sunday, August 4, 2013

Who benefits from genealogical data standards?

9 comments: