Pages

Friday, June 13, 2014

What ever happened to the genealogical data standards issue?

I could answer this question in one word. APIs. The emphasis in transferring information between genealogy programs generally has been all but superseded by the large online genealogy database programs efforts to integrate their data with FamilySearch. However, the issue of exchanging information between different individual genealogical database programs remains unaddressed and unsolved. There are a few individuals who keep addressing the issue, such as Tony Proctor with his Parallax View blog, but the rest of the online genealogical community is strangely silent on the subject.

The main issue, adequately transferring complete genealogical data from one program to another still remains. But it appears that other than the ongoing GEDCOM X effort, little or no progress has been made since meeting held in 2013 at RootsTech. Meanwhile, almost all of the currently available desktop oriented genealogy database programs are still dependent on the old GEDCOM standard.

The effect of a lack of a standard way to transfer data is that as the genealogy programs become more complex and the demands for recognizing more data fields continues to grow, there is more and more lost information as data is transferred from program to program. This is especially true as more programs ignore the old GEDCOM standards or face the old standard's limitations.

Last year, 2013, there was a detectible movement in the Family History Information Standards Organisation (FHISO), but since 30 July 2013, almost one year now, there has not been anything posted to their website. Likewise, its predecessor organization, BetterGEDCOM, has shown no activity.

Now there are many ways to approach this issue:

  1. Drop the issue and forget about standards.
  2. Write sarcastic blog posts.
  3. Start pushing for some kind of data transfer standards
  4. Wake up and notice what is actually going on

Well, it turns out the fourth option is the one to consider. What is happening is that the four very large genealogy database companies (VLGDCs) are actively working on the problem of sharing data using APIs (Application Programming Interfaces). FamilySearch has data sharing arrangements with each of the three other VLGDCs. I do not pretend to have the slightest idea how they are sharing data but databases (collections) on FamilySearch.org are showing up on the three other companies; Ancestry.com, MyHeritage.com and possibly, findmypast.com. In addition, at least with Ancestry.com presently, there is a limited amount of data sharing between family trees on the two programs. It seems inevitable that this sharing arrangement will sometime include images and other media files. Three of these large companies, Ancestry.com, FamilySearch.org and MyHeritage.com, have desktop programs that can exchange data (i.e. synchronize) files to a greater or lesser extent with a desktop genealogy database program. In the case of Ancestry.com and MyHeritage.com, it is their own proprietary programs. In the case of FamilySearch.org's Family Tree, there are several third-party programs that can move data (share data) back and forth with the online Family Tree.

There is presently no way to test how effective it would be to try to move a complete online tree from Ancestry.com to Family Tree and then to a desktop program. Likely this would be accomplished by moving separate data fields either one-by-one or in some kind of batch process.

The danger here is that FamilySearch.org's Family Tree may become over-saturated with the messy files on the other programs.

So while the genealogical community essentially ignores the issue the VLGDCs are in the process of programming what will likely become a de facto standard. This may be accomplished by driving any unconnected software company out of business.

I am not asking anyone to do anything. I am just commenting on the evidence.


4 comments:

  1. For what it's worth, unless someone from one of the big companies spills the beans and tells us the technical truth, I very much doubt that FamilySearch use APIs to access Ancestry data (to take 2 names out of thin air). I suspect they simply copied the indexes of the chosen datasets across, manually writing code to drop items into their corresponding columns in the target databases. Once off. So there's no "live" communication - or, that's my guess.

    And if that's what happens, then the VLGDCs are not establishing a de facto standard at all - just a whole series of individual recipes that load (say) an Ancestry US census index into a FamilySearch US census index and would be utterly useless to load an Australian census index, apart from serving as inspiration for another individual recipe.

    ReplyDelete
    Replies
    1. That's not the impression I got at the Innovator Summit. All they did was talk about APIs. From that, I guessed that they were writing code, not just exchanging indexes and data dumps.

      Delete
  2. To me, and to many others, it´s not just a matter of Data Exchange James. In one of my own responses (http://parallax-viewpoint.blogspot.com/2014/06/bootstrapping-data-standard_11.html?google_comment_id=z12kfnti0zjcyl2rv04cc51bvrbkt1wi51c, although this wasn´t taking me precisely to the right comment when I tried it) summarised my other goals in an attempt to dismiss the "API issue" as any type of relevant solution.

    ReplyDelete
    Replies
    1. But you have to admit that there hasn't been a lot of discussion going on. At least not in the public forum. I agree with your position but the whole issue is likely to be sidetracked by the emphasis on the data exchange.

      Delete