Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, August 29, 2013

Are there really any genealogical standards?

Let me take a simple example of the reason for the question in the title of this post. On any given day, I can go to dozens, perhaps if I wanted to, hundreds of online family trees. In reviewing the entries and ignoring the differences in data, one thing is abundantly clear: there is a total lack of consistency in how the entries are entered. Starting with the family trees that have no information about any particular included individual, you can see a huge number of variations in the way the data is entered.

For some examples, I will go back to my Great-grandfather Henry Martin Tanner, who, by the way, is extensively, I should say exhaustively documented online. Here is a list of some of the variations in the birth date and place taken from's online Public Member Family Trees. Please note that the spelling of the name of the town in California is San Bernardino.

11 Jun 1852, San Bernardino, San Bernardino Co., California
11 Jun 1852 San Bernadino, San Bernadino, CA
11 Jun 1852 San Bernadino, San Bernadino, California, United States
11 Jun 1852 San Bernadino, S Brno., CA
11 Jun 1852 SB, S, California, USA
11 Jun 1852 San Bernadino, S Brno., CA, USA
11 Jun 1852 San Bernadino, S Brno., Ca
11 Jun 1852 San Bernadino, San Bernadino CO, CA, USA
11 Jun 1852 San Bernadino, San Bernadino, California

Now, how do you cite the city, county, state and country in "standard" format? I suggest this is the full and should be standard entry:

11 June 1852, San Bernardino, Los Angeles, California, United States (or USA whatever)

As I have noted many times in the past, San Bernardino County was not formed until 1853. SAN BERNARDINO created from LOS ANGELES. (Calif. Stats. 1853, 4th sess., ch. 78/pp. 119–123) See Newberry Atlas of Historical County Boundaries. But ignoring the mistakes, such as misspelling the name of the town and county, and the inaccuracies, such as the wrong county, it is clear that there is no standard way of putting the information into an online program. So here are the questions:

  • Do we care?
  • Is there any reason at all to care?
  • Should we be concerned about consistency?
  • Does it matter that the entries are different?
  • Since the correct information is freely available and there is no controversy, why should we even think about having a standard?
  • Isn't this a free country and we can all do what what we want?
  • Some of these contributors may be just starting out and do we want to discourage them with a need for accuracy?

I suggest that we have a more basic issue than merely establishing data exchange standards. Do I really want to exchange data from someone who has some of these types of entries? Do I want to spend the rest of my life cleaning up some one else's citations, when I have enough of that to do with my own database? Wouldn't it be nice to decide if the standard way of citing the country here in America was either "United States" or "USA" or even if adding that is even necessary as sort-of a basic beginning?

I could go on to dates and names but it would be too discouraging. I would guess that most of these entries come from people who have absolutely no awareness at all of anything approaching a standard way of entering data. They have probably never even heard of the concept. So, back to my question. Do we really want to try to exchange data when this type of problem exists? I can be a meticulous as all get out and then add a few of these entries into my database? Think again.

Now, I do not want to become the standards police of the genealogical community; the job doesn't pay enough. But the inconsistencies do raise some serious issues when you get into the larger discussion of standards for exchanging data. There are issues that are more insidious. Why would you want to exchange data when the standard for accuracy is so low as to be non-existent? I am speaking of the standard of citing the location with the jurisdictions that were in existence at the time the event occurred. Even if we ignore the differences in format, can we simply ignore the fact than none of these people even suspect that the county did not exist at the time of the event and probably do not care?

By the way, there are 83,115 family trees on with a reference to Henry Martin Tanner and in going through the entries for a couple of pages, I did not find one accurate entry. Oh, I could have gone on and on about the lack of sources, but one example is sufficient. This is the source cited for the entries:


Hasn't the genealogical community dug itself into a hole we can't get out of? Doesn't creating a better information exchange standard simply facilitate this whole mess?


  1. Mr Tanner,

    I think the answer or at least part of the answer, is that people are in too much of a hurry and don't take the time to look at what they are doing.

    I was helping, or trying to help, a friend with her research. She was just beginning and I was trying to help her get started, especially in the area's that you have spoken / written about. She was looking at a "hint" on ancestry, without spending any time looking at the source, she just wanted to merge that data into her file. She got mad at me for saying 'hold up' ARE you SURE that is your person. She looked that the same and say Yes. She slowed down, while I was watching her, to see that the person what NOT her person, after looking at the details.

    Your place name example is "in the details" and the suppliers of the data that we are looking at aren't helping.

    Anytime I find a source, review it, determine it IS my person, I ALWAYS have clean up work to do. I have blogged about that myself. Yes, it slows us down, but in the end, our data is more accurate.

    I think the answer is "Speed". Let's build our tree NOW, in a hurry and don't worry about the details.

    Thank you for speaking out about this.


  2. I've worked on standards (national and international) in the financial community for decades. In my experience, the only incentive to standardize is when it is financially motivated. It either has to make money or save it. When it is only our data that we don't expect others to share we don't need much of a standard.
    In genealogy, if we would value our time spent researching and documenting information we might see the benefits of standards, even if we don't plan to share it widely.

  3. Consistency is a desirable quality, but any placename "standard" has to cope with some realities that your example doesn't highlight.
    A placename documented in a family database should be based on the current evidence found and evaluated by the editor(s) of the database. If the research is on-going and/or the documentation does not give a complete and unambiguous identification of the place, then a full and, hopefully, historically accurate placename is not yet possible.
    Current conventions discourage abbreviations, but allow for missing components and permit variation, especially for internationalization.

  4. I loved this post. I suspect that most people started out by having some family documents or hand written pedigrees written by memory of an ancestor. That info got recorded and the individuals interest grew. Years late, after having input much, a deeper understanding comes along and they realize that sources and notes and the like actually exist. Nobody starts out as a professional genealogist. Nobody learns all of this before making their first entry. I am grateful for any info any person records.

  5. There is a lot of discussion about the problems with standards in genealogy - whether citing sources or place names or even the names of individuals ... what I haven't seen are any proposed solutions.

    Perhaps instead of rehashing all the same old arguments ("look at how wrong THIS one is ..."), we come up with a plan for whatever standards are needed and a way to implement them that ALL of the software developers are willing to use. Oh wait ... someone IS already doing that ... FHISO ( Clearly, more individuals, developers, and organizations need to be involved in those discussions - the more input, the better. There is an ongoing call for papers (see website) for this very topic.

    Bottom line is that until there ARE standards, nothing is going to change.

  6. Russ, many people do not hesitate to leave a mess to others to clean up. Unfortunately, it's part of today's culture. Thank you for your attempts at curbing the epidemic :-)

    In general, doesn't standardized place names in drop-down lists on Ancestry and FamilySearch perpetuate the problem and contribute to the failure of educating someone on the history of a locality? Do those who create the databases for these and other companies know this type of history? Is there no business incentive to be this accurate? And, as far as USA or United States, and like considerations, in my experience it all depends on the decision-makers of the moment.

  7. This place example is quite easy to solve James. I commented elsewhere that the actual place-names used to identify each item in the place-hierarchy-path and largely irrelevant. This, in turn, was because every place can have alternative names/spellings, just like people.

    Before anyone rushes to counter this, just consider, for moment, those cases (the majority as it happens) where you have positively identified a place named in some record. In principle, even if two people refer to it in different ways, they can still agree they're referring to the same place. What is missing is an independent way of nominating a place, i.e. other than by its place-name or place-hierarchy-path. Something analogous to a "place citation".

    Well, this is where a place-authority would come into place ( Such an authority could allocate identifiers (e.g. UUIDs or URIs) to unambiguously refer to each registered place.

  8. People will only follow a standard if it benefits them. So if they want to interface their data to some sort of mapping program and the names are unknown to the mapping software - or even worse, known but in the wrong place - then this will encourage them to get it right.

    Since Brno is a place in the Czech Republic, then 3 of those people will either get puzzled or will learn something.

    There are several points on the score of mapping. Firstly the mapping seems to be fairly fluid in its searching, so it will find stuff even if it's not exact.

    Secondly, generalised mapping stuff works, so far as I know, only on current placenames, so it's no encouragement here to use the contemporary county.

    What's needed is someone to produce fantastic looking maps showing contemporary boundaries and names that can be used to map your relatives on. Only then will anyone be interested in getting the contemporary names.

    But even then, it's no use if the rest of the software doesn't recognise that "San Bernardino, Los Angeles, California" is the same physical place as "San Bernardino, San Bernardino, California" and your chap hasn't gone walkabout.

    So, never mind the data entry - when's the software going to make it worthwhile to get the data entry right?

    Couple of points from one of the Small Islands on the eastern side of the Atlantic - acknowledging Bill Bryson's book title there...

    1. Is there a reason why you don't put "County" on the county name? "San Bernardino, Los Angeles, California" looks like a suburb of Los Angeles to me. "San Bernardino, Los Angeles County, California" would be more meaningful to me.

    2. Please put "USA" (or whatever) into the placename. Firstly omitting it is terribly parochial. (Nice to see you include it, James). Secondly, I just have this blind spot that I can never remember which side of the border Michigan and Manitoba fall, so clues would be helpful!

  9. Tongue in cheek... Try some of these translations of the original / true meanings of a number of place names in the USA.

    (thanks to MH Forsyth on