Monday, October 20, 2014

Genealogical Noise

At some levels, genealogy is all about acquiring and processing information. Whether you are searching online or sitting in a church or library reading old parish registers, or involved in some other activity, you are either gathering, organizing, evaluating or recording information. But if you attempt to define "information" in some type of formal way, you soon find out that the standard definitions are slippery and circular. You end up with a hodgepodge of words that include; data, facts, intelligence, knowledge etc. that all seem to be defined by the same set of words. For example, here is one definition of information:

facts provided or learned about something or someone

Now here is a definition of the word "fact:"

a piece of information used as evidence

Notwithstanding our collective inability to adequately define exactly what we are seeking, as genealogists, we diligently pursue our task of searching out facts or information about our family. Back in 1948, an engineer/mathematician/cryptographer named Claude Shannon published an article in the Bell System Technical Journal entitled, "A Mathematical Theory of Communication." In this article and later publications, Shannon developed what is known as "Information Theory" and laid the foundation for digital computer design. One of the concepts to come out of these early developments is the idea of a "signal-to-noise ratio" or a way to analyze the amount of meaningful information (the signal) as compared to the background noise (the unwanted signal). 

As genealogists were are almost continually overwhelmed with unwanted signals or noise or in other words, the amount of useful information we find as opposed to false or irrelevant information or data. Some researchers are almost paralyzed with the amount of unwanted data they receive. The difficulty comes from not only the complexity of the data but the amount of data or information received. This is not necessarily a new situation, for example, if you are searching a microfilm for specific information about an ancestor, you have to process a lot of "noise" or unwanted data before finding the one or two facts you are searching for. Unfortunately, this kind of "noise" was relatively easy to handle, but today's levels of noise have become overwhelming. 

Some of the most common complaints I receive involve this unwanted genealogical background noise. Some of the newer programs have increased rather than decreased the amount of noise by providing automated systems that generate additional information when the receiver of the information, in this case the user of the program, has no idea of the relevancy of the information or how to process it. One example of this problem, I received today, was a notice that someone I did not know was having a birthday. In other instances, I get suggestions of family tree connections with others who are categorized as potential "relatives" in numbers that are overwhelming. In other programs, doing a search for information about an ancestor will produce numbers, sometimes large numbers, of matching family trees with repetitious entries of obviously wrong information.

All of these instances constitute genealogical noise i.e. unwanted signals that obscure the real or valuable information I really want to find. Several comments made to my recent posts expressed this frustration as it applied to the Family Tree program. Unwanted information, in the form of apparently random changes in the data, were viewed as a threat to the integrity of the researcher's own perception of the "facts." In this way, the changes were not viewed as attempts to conform differing opinions on the actual data, but merely as noise that destroyed or obscured the researchers "own" data which, in every case, was assumed to be correct. 

It is inevitable that the amount of genealogical noise will continue to increase. It will become increasing difficult to filter out the meaningful information from the unwanted signals. All you have to do to experience this phenomena is to watch the stream of information on for a few minutes or any other social networking website, and you will see what I mean about unwanted information. 

What many, if not most, of the genealogists who are overwhelmed with genealogical noise do not have is an efficient filtering system. Rather than continuing to view noise as an obstruction, they need to understand that noise, in some form or another, is always present in any system of communication. The genealogist facing an annoying or overwhelming amount of noise needs to think of ways to diminish the amount of noise or create mechanisms for handling the noise through a system of filters; either mental or actual physical. For example, if I do not want to receive notifications of the birthdays of remote relatives, I can usually turn that function off in the program by editing my preferences or settings. I can also develop ways to ignore any such messages. 

In my own case, I would be almost paralyzed if I did not develop adequate methods of filtering out unwanted signals or information. On a normal day, I can get well over 100 email messages and hundreds of other notifications. In my case, I have worked hard at managing that flow of information, especially when 90%+ of it is unwanted noise. The challenge is to filter out the unwanted noise without destroying the signal altogether. 

In the case of a program such as FamilySearch Family Tree, the noise comes from the nature of the program itself. Most genealogists are not at all used to the idea of instant and pervasive collaboration. They are so used to working by themselves, that information supplied by the users of the program in the form of "changes" are viewed as a threat to the integrity of the data rather than a normal function of the program. The researcher immediately jumps to the conclusion that "they are ruining my data" instead of viewing the changes for what they are; differing approaches to the same set of information. If the contributor changes "your information" you view this as a threat rather than merely another opinion about the data. The revelation that there are people out there who disagree with your own opinion is unsettling and threatening. Noise is not viewed as an inevitable component of the system, but as a personal threat. 

Many genealogists get into a situation where they are battling the noise rather than controlling it. For example, they view unsourced and unreliable family trees as a threat rather than simply ignoring them. With a shared data type of family tree, such as FamilySearch Family Tree, the genealogists become frustrated, angry, despondent, and finally condemn the system rather than working within a high noise situation. 

We all need to understand that we live in a world with a high noise content. We either learn to manage the noise or become incapacitated by it. I think I will need to return to this topic in the future because there is a lot to say about noise. 


  1. This is a very interesting problem. Statistician Nate Silver's recent book The Signal and The Noise: Why So Many Predictions Fail - But Some Don't examines some ways that people have successfully separated the signal from the noise in a variety of different fields. It provides some insights for genealogists facing this issue.

  2. Thanks for the comment and the reference. It sounds like a book I should read.