Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, August 7, 2013

What is a genealogical data model?

In the ongoing discussion about genealogical standards, there is a constant undercurrent of references to a genealogical data model. The idea that it may or may not be possible to develop such a "data model" raises interesting questions.

I will jump back to 1998 when GenTech published a Genealogical Data Model based more on the work flow of the research process and not focused on the individual fields of the data entry into individual programs. Quoting from the Model:

After some work on the Lexicon, the group recognized that it is difficult to define genealogical data out of context because of the various ways people interpret common genealogical terms.  The group decided that the effort would be better served by defining genealogical data in the context of a logical data model, which is a systems engineering methodology used to define data in an automated data processing system.
This Data Model is available on the National Genealogical Society website. As stated in the GenTech report, the Model was a logical data model, not a physical data model. So, the question is, when we are talking about developing genealogical data transfer standards are we talking about creating a logical model that will generalize the process or are we actually involved in a discussion of creating a correspondence between actually implemented programs? There is apparently only one upgrade of the model in the year 2000.

This particular model was sponsored, at the time, by the following:

  • GENTECH (Charter sponsor)
  • Federation of Genealogical Societies (FGS) (Charter sponsor)
  • New England Historic Genealogical Society (NEHGS)
  • National Genealogical Society (NGS)
  • American Society of Genealogists (ASG)
  • The Association of Professional Genealogists (APG)
  • The Board for Certification of Genealogists (BCG)

GenTech is a division of the National Genealogical Society that facilitates communication among persons interested in genealogy and technology.

I note that all of these organizations are still operating, however, I detect little participation from these organizations in the current discussions about standards. 

A different approach to creating a data model comes from Irish software architect, Tony Proctor. In his website, Family History Data, he approaches the issue from a more practical standpoint, that is, the need to transfer data between different commercial genealogical data base products. 

Somewhere between the generalized concepts of the flow chart of genealogical research and the more specific questions of transferring data between various commercial programs, we need to focus on what we mean by a genealogical data model. Currently, we have the Family History Information Standards Organization (FHISO), the GEDCOM X project, The GenContent model,, and any others I may not have run across yet. It seems like we are all talking about the same thing, but are we? If the idea here is to develop software, then perhaps we need to know what it is that genealogists do and how they wish to record their research. But if we are talking about exchanging data between existing genealogical database programs, aren't we well past the stage of conceptualizing the research process?

This issue brings up another question. Do the current commercially available genealogy programs adequately model the real world of genealogical data? What if my ancestors came from China or were Native Americans? Can I use a current program to adequately represent the differences in naming practices between those used by my ancestors and those imposed on me by the fields available in the programs? My experience with Native American names would argue that the current crop of programs is highly cultural specific and the fields available are not well adapted to alternative naming practices. 

Does a genealogical data model take into account variations in cultural practices? Isn't there a sort-of one size fits all mentality with the current data programs. How do you adequately show kinship structures other than the Western European nuclear patriarchal family? Are we content with having an English based, Western European genealogy? Do our models allow for cultural diversification? It may be a good idea to have your commercial genealogical program in different languages, but do the different languages reflect cultural changes or are they merely direct translations of an English (or whatever) program?

Is there a difference between a robust model and an adequate model? Do we need a robust model?

More on this later. Of course. 


  1. You are spot on. I believe that the free market determines what will become available. Most of us want this or that, but that does not make it economical to develop. Beyond FamilySearch, is there anyone who is going to spend the time and money to create things with a negative return?

  2. I am actually English James. I merely live in Ireland. :-)

    I have to remark on your reference to STEMMA though ( since it came about through an independent stance rather than any pragmatic approach to exchange between existing software products. It is primarily a research project into the development of a comprehensive multi-cultural data model, but the ultimate goal is to represent my own research data. This was too much for any existing product and so I was forced down this path. Various innovations resulting from the project have been submitted to FHISO's call-for-papers ( for consideration.

    1. Thanks for the clarification. I hope people will start looking at the different proposals.