Some people eat, sleep and chew gum, I do genealogy and write...

Monday, December 28, 2015

How does geographic clustering work?

In my second post in the series on "Solving a Complex Genealogical Challenge - Charles Parkinson -- Part Two" I illustrate the principle of clustering geographic areas to determine the likelihood that a group of people are related.  I believe this particular methodology needs further explanation.

As we go back in time, transportation issues become more and more important. Today, we can fly across whole continents in a matter of hours. In the 1800s and in earlier times, distances were measured in days, weeks and sometimes months of travel. My examples have been focused on the Family Tree because it is freely available to all registered users, but any pedigree in any program will suffice for examination and correction.

To gain an educated perspective on the time it took for travel at different dates in the past, it is important to read some basic history and realize when certain transportation improvements became generally available in the area where your ancestors lived. For example, here is a map showing the rates of travel in the United States in 1800.

This map is somewhat misleading. It assumes that the traveler would take the most rapid form of transportation available. In many cases, people had to walk or ride a horse and the travel times were limited by weather, terrain, major obstacles such as rivers and mountains and many other factors we tend to ignore today. Two people may have lived only a few miles apart and yet they may have never met or had any contact with each other because of a lake, river, canyon or mountain range.

When doing genealogical research, people have a tendency to ignore geography in deciding when two people are related. This becomes a huge problem when families have been compiled by focusing on names and ignoring geographic reality. This fact can be demonstrated over and over again by referring to any online family tree program. In any list of children, especially when there are a large number of children, it is very common to find some who lived well outside the reasonable area of consideration.

To illustrate the problem, I routinely put any places identified in a family tree on a map to see if they are reasonable. Here is an example that took me only a few seconds to find in the Family Tree.

Here we have a man named Thomas Richardson who is supposedly married to a woman named Anne Bennet. There are two marriage locations and dates for this couple: 1 October 1745 in Glatton, Huntingdonshire, England and 30 September 1767 in Saint Andrew, Enfield, London, England. This is an extreme example because obviously the marriage could not have occurred in both places. But the question is where did these people live? If you examine each of the listed children, you will see that they were all listed as born in Glatton, Huntingdonshire, England, therefore, if these places are correct, then the marriage in London is highly unlikely. Why is this the case? The distance from Glatton to Enfield is about 67 miles. It might take as long as 20 hours of more of walking or two days on a horse to go that far. It is not reasonable, absent some other consideration such as relatives in the area or the location where the wife was born, to account for the discrepancy caused by the distance in the mid-1700s.

An extension of this principle is the idea of "clustering" all of the places listed in a pedigree to ascertain whether or not there are any anomalies. As you trace your families back in time they will appear to cluster in an area. In many cases, one or more individuals will then break out of the cluster and may form a new cluster in another geographic area. In my own family clusters have occurred in Denmark, England, New England, Utah, Australia, Arizona and many other places. For example, all four of my grandparents and all but one of my great-grandparents lived in close proximity to each other at some time in their lives. If you find someone who seems to have the same name as your ancestor and you cannot locate an event in their life, then you cannot conclude, on the basis of name alone, that the person is related.

It is also important to look at the naming patterns of the areas where your ancestors lived. In some countries at different times some names were extraordinarily common. To the extent that names are commonly used, the issue of geographic location becomes more vital and can extend down to identifying the exact house or farm where the people lived in order to be accurate.

I suggest mapping out all of the locations you encounter. It is always a matter of note when I am working with patrons at the Brigham Young University Family History Library that so few people are even vaguely aware of the locations of the places they are listing for events in their ancestors' lives.

A last note. Indicating that an ancestor was from England or New York or whatever is really an excuse for avoiding research. A general geographic location does not identify an individual.


  1. I find no point in getting peeved with these unlikely juxtapositions. One must explore the life-path, since laborers in particular could have moved between birth and death.

    I do find humor in the outright-impossible, such as a tree showing a middle-1600s birth in one English place and baptism **the very next day** 300 miles away. Marriages at age 4 or before birth, Census enumerations 20 years after death, and suchlike, are getting tiresome rather than humorous, though.

  2. Thanks for the great post! This is such a key point. I don't know how many times I have had to undo changes in Family Tree because someone created a family or extended lines simply by linking together people with the same name, who lived at about the same time. In most cases it seems they paid no attention to geography. I just wish we could somehow get more FT users focused on the geography issue.