Some people eat, sleep and chew gum, I do genealogy and write...

Monday, September 2, 2013

Can we even have standards for names?

I did a post recently showing the variations in the way places have been recorded in online family trees. On reflection, the basic question is whether the concept of a "standard" in the sense of a uniform way of representing genealogical data is realistically obtainable? For example, if I have recorded the name of an ancestor the following way:

William George Ellis

And one of my relatives records the name in this fashion:

William George (Triplet) Ellis

Now, moving on to a completely different level of concern, what set of data exchange standards is going to see that these two names refer to the same person? These are actual examples from an online family tree program. In this case the source of the information is not important. Most experienced genealogists would immediately see that the word in parenthesis is inappropriately inserted before the surname. Now, is a program going to account for every possible weird error in recording names? Will a merge program see these two entries as the same? The issue here is dealing with real world data; entries made by a variety of differing levels of adherence to any sort-of genealogical name standard.

I realize that I am writing about at least two different standards, one is the generally accepted way of entering names into family group records, whether on paper or in a computerized form, the other is a standard description of the category "name" used as the basis for exchanging information between programs.

The above entry containing the word "Triplet" also brings up an entirely separate issue. Do we discard the suggested information that this individual was a triplet? If it is not appropriate to insert or append the designation in the name field, what are we going to do with the information? Moving up to another level of concern, how will a computer program handle this information and where will it be recorded?

There is no doubt that this is a real issue. I refer to the following book:

Bennett, Archibald F. A Guide for Genealogical Research. [Salt Lake City]: Genealogical Society of the Church of Jesus Christ of Latter-Day Saints, 1951.

On page 18 of Bennett's book, it states:
All names entered on your family group record should be written in full, and in the same order as when the name is spoken--given names first, surname last. The use of nicknames and abbreviations should be avoided. If there are several given names for a member of the group, there is space on the sheet for the names to occupy two lines in the space provided for the names of one person. The maiden name of a wife should be used. 
Obviously, Bennett did not contemplate the kind of variation in the naming pattern illustrated above. There is no accounting for an "illegal" variation. But do we want to discard the information? How would the computer program know that the word inserted was not an additional name or variation on a name of the individual? Likely, this issue would be recorded as a variant of the name and the user allowed to select the most appropriate form of the name. This is the solution used by nearly all the online family tree programs. But the example is neither an alternate name nor a variant.

By defaulting a non-conforming name to a list of alternates, the problem is merely avoided and transferred to the human user of the program. If one of the goals of genealogical standardization is computerized data transfer, then the exceptions create their own category that has to be accounted for manually. But what happens if the exceptions are more numerous than those that conform to some pre-determined standard. If you review the entries contained in online family trees, you will see that as a group, genealogists have not been too good at adhering to any sort of data entry standard, even the simple one proposed by Bennett in 1951.

In mathematics, by analogy, the above example might be called a pathological function. Pathological examples often have some undesirable or unusual properties that make them difficult to contain or explain within a theory. See Wikipedia: Pathological (mathematics). I certainly acknowledge that the insertion of the descriptive term into a name was inappropriate, but any overall theory of standardization, especially within a genealogical structure must be robust enough to account for inexplicable entries and not simply reject them as non-standard. Placing the non-standard in a separate alternatives section begs the issue of establishing a workable standard. If an individual user has to intervene each time the program transferring data comes to an unacceptable variation, we defeat the reason for having a program in the first place. Of course, as human individuals we could perform the merger of the data on an item by item basis.

There is a commonly quoted statement that the exception proves the rule. Unfortunately, this expression is widely misused and misquoted. A better way of stating it would be, an exception proves the existence of a rule. This is generally applied to rules that contain their own exceptions. For example, this area is closed except on weekends. The rule is that is proved is that the area is commonly closed. But in the present situation in genealogy, the exceptions indicate that there is no generally accepted or understood rule, at least in the sense that the genealogical community at large observes any type of consistent method of entering names online. In many cases, the exceptions far outnumber and smother the supposed rule.

Developing a standard method of exchanging genealogical data presupposes that the data being exchanged is somehow regular enough to adhere to some set of descriptive rules. In the case above, the exception proves that there is no accept or known rule, not that the rule exists. It would only prove that the rule exists if there were an implicit unspoken understanding that the first example above is "correct" and in conformity to the rule and second one is not.

In the real world of genealogy, we have to take the data as it is and that does not lead to an easy solution to the problem of standardization at any level.

From a slightly different viewpoint, there is actually a significant movement in the genealogical community to abandon the historical rules such as the standardized rules of name entry under the guise of attracting new adherents to the practice of discovering their family history. So how do you deal with a movement eliminate standardization in favor of popularity and attracting new, younger practitioners?

No comments:

Post a Comment