Some people eat, sleep and chew gum, I do genealogy and write...

Friday, August 23, 2013

Is a unified family tree even possible?

Commentator Tony Proctor suggests that collaboration on "a single, global family tree" is a myth and may not work at all. A single, global family tree is by its nature collaborative. However, such a structure presupposes a common cultural and social interpretation of kinship structures. Tony's comment raises some significant issues concerning the viability of collaborative genealogy absent a consistently defined metadata. Without a "Meta-genealogy" establishing a common terminology and accurate description of the data sets, discussion about genealogical collaboration are like trains passing in the night. There is no sure way to determine if the various participants in the genealogical community are even talking the same language. Simply because you label some online family tree program as universal or one world, does not mean that it is automatically adaptable for international purposes.

A useful definition of metadata is contained in the publication of the National Information Standards Organization (NISO) accredited by the American National Standards Institute (ANSI), called Understanding Metadata:
Metadata is structured information that describes, explains, locates, or otherwise makes it easier to retrieve, use, or manage an information resource. Metadata is often called data about data or information about information.
Metadata can be either structural or descriptive. I believe here, we are talking about structural metadata or the design and specifications of data structures, not the particular content of the individual instances of application data. In this sense the description of the Meta-genealogy becomes the basis for standardization, not the other way around. If we wait for some agreement on standardization, we will never obtain the objective of providing an accommodating structural metadata.

I think a simple example helps to start the concept of the difficulty of talking about standards without the underlying structural metadata. Nearly all of the currently available genealogical database programs use a naming structure that has spaces for a surname and a given or first name. Some of the programs resolve the issue of a middle name by labeling the first data entry blank as "First Names." In addition, many of the programs provide places for a prefix or title and a suffix or other name. So what do you do with a name such as

Raul Ortega Rodriguez

Do you have a separate version of the program with an accommodation for Spanish surnames or do you simply ignore the problem and provide a "one size fits all" solution based on English surname patterns? This example may be trivial, but the idea is to define a metaname field that will contain every instance of naming practices in every kinship structure. To see why this this is necessary, simply ask this question about the name above: What is the surname? Which of the two names is the one a genealogist would need to search on? Is the answer both names?

In this sense, a workable metadata approach to genealogy would be culturally neutral. Avery interesting attempt at creating such a system is that developed by the commentator, Tony Proctor, on his website where he is developing the STEMMA data model.

However, I am not just talking about a system that allows developers to implement cross-cultural data transfers, I believe we need to develop a more fundamental way of looking at the larger world wide context of creating a universal family tree structure. Not in the sense that we create a centralized collection but allowing the relationships between the data components to be restructured through continuous adaptation to additional data. The closest structure that currently reflects this ability is wiki-based. But even wiki-based systems can bog down in cultural specific network connections.

The beginning of a discussion in this regard should consider some of what has already been accomplished with large online databases. For a review of the current status see the following:

Hillmann, Diane I., and Elaine L. Westbrooks. Metadata in Practice. Chicago: American Library Association, 2004.

You can see from the date of the book, that these ideas are not necessarily new or innovative. 

Another example of the need for metadata in the area of genealogy is the proliferation of online family trees allowing users to upload images. Unfortunately, other than linking the image to an individual or individuals in a family tree, there is no accompanying data about the provenance of the photo or image.

I suggest that this type of development needs to begin on the international level in the genealogical community and if is already in progress, I suggest it needs a higher profile. 


  1. I'm still not 100% certain of what you mean by meta-data in this context James. I did a search through your previous posts but couldn't find an example. I thought it might be "extracted items of evidence", such as Age, Occupation, etc - something that the STEMMA model calls 'Properties' - but I could be wrong.

    The handling of evidence is non-linear, by which I mean that such Properties cannot, by themselves, give a full picture and be used to form conclusions. They're helpful, yes, but the context of when, where, and who else was present in that event, and even their lives, all need to be considered together. This is one of the few arguments I have against the popular concept of a 'persona' in the handling of evidence. Anyway, I'm sure this will be discussed again soon.

    Re: Online collaboration, though, even with a fully-agreed set of meta-data concepts, I would argue that the "single tree" approach is just too naive to work. Disagreements cannot always be resolved beyond doubt and so different researchers will have different opinions. You can see how this causes arguments in something like Wikipedia so it's hardly surprising that it cannot work smoothly for genealogy. I'm not saying that online collaboration isn't possible, though. Only that the simplistic single-tree models currently in use cannot work.

    1. You are right about discussing this again soon. As I mentioned there are two different approaches to the problem; one addresses the needs of recording the background information about an entry, such as where, how, why etc it was created and the second approach concerns the content of the data field and how it can be integrated into a larger data context, i.e. exchanging data between database systems. An example of the first concern (structural metadata) is setting down a standard way of establishing the context of the information to be gathered in the "name" category. I could go on and probably will.