Some people eat, sleep and chew gum, I do genealogy and write...

Monday, March 30, 2015

What is the probability of finding your ancestors?

This is an interesting question: what is the possibility that you will or will not find your ancestors? Each time I look at a weather forecast, I wonder what they mean when they say there is a certain percentage chance of rain. The reason I wonder is because I think there is a very loose definition of "rain" where prediction is concerned. If it rains one drop, does that constitute a validation of the prediction? It seems to me that rain, even on drop, is an either/or proposition. If it rains one drop, then the chance of rain went from zero to 100%. If it does not rain, the chance of rain stays at zero. Weather prediction has not advanced to the point where anyone can foretell the falling of that one drop of rain.

Now general percentages of the probability of rain seems to work fine for places like Mesa, Arizona and Provo, Utah that have relatively little rain, but what about a place like Panama City, Panama, where it rains three times a day in the wet season and only once or twice a day in the dry season. What about some places in Hawaii where it rains over 400 inches a year?  Even in Colon, Panama on the north coast, the rainfall is over 130 inches a year. When something, like rain, is an almost constant, what is the use of predicting a probability of rain? Isn't the probability of at least a drop of rain almost 100% every day?

So what do we mean when we talk about probability? If we define probability as the extent to which an event is likely to occur, measured by the ratio of the favorable cases to the whole number of cases possible, then what is the probability that I will find an ancestor? Any ancestor or a specific ancestor? This question revolves around the definition and application of the term "favorable cases." If you are trying to predict the percentage probability of rain or no rain, you look to forecasts based on present conditions plus historical records. When such and such happens, we know we had rain in the past so many times and so we can predict the possibility of rain in the future because we are historically right a certain percentage of the time. The more information about the existing conditions you can gather and the more historical events you can review, both increase your accuracy. This is the reason why computers have had such an impact on weather prediction. They can analyze so much information so quickly.

If you think about genealogy in the abstract, you would probably come to the conclusion that the more ancestors you had already found and the more you know about the existing historical records, the higher the probability that you would find more ancestors. You could bolster that conclusion with the argument that the number of ancestors (just considering direct line ancestors) doubles with each generation. So the probability of finding your own parents (two people) would be rather low, but as you add members of your ancestral family, the chances of finding one more ancestor improve in direct ratio to the number already found (i.e. the favorable cases). Just as with weather forecasting, you must also include information about the existing records and what they might contain.

What about the probability of finding one specific ancestor? Here the issue is that the number of "favorable cases" has no bearing on the discovery of a specific ancestor. Therefore the chances of finding a specific ancestor would seem to be rather low and may approach zero in some cases. This is analogous to trying to predict the fall of one rain drop.

With physical phenomena, such as rain or snow etc., we assume patterns and cycles as measured by keeping a record of past events. We have no such ability in genealogy. Just because I find a family member in Denmark says nothing at all about my ability to find another family member in England or Germany. This is true based upon an assumption that an individual researcher likely knows the records of one particular place more thoroughly than some other place. So the favorable case scenario has no predictive force in genealogy. You could argue that your ability to find ancestors, any ancestors, increases with your experience. However, that would mean that I could never predict the probability of finding an ancestor, even any ancestor, without knowing the exact qualifications of the genealogist. Therefore, in any case I could postulate, my ability to predict the outcome of a genealogical search would be about zero.

What if I factored in a score for my objective evaluation of the ability of the genealogical researcher? What if this score were based on some sort of objective evaluation, such as, the past record of this researcher in finding ancestors. But I would still guess that the ability of the researcher would have little or no effect on the probability of finding a specific ancestor, although it likely has a bearing of the researcher's ability to find any ancestor.

How did I come to these conclusions? I am not a statistician, nor am I an accomplished mathematician, but I have studied a great deal of strategy. One of areas in my university studies where I had the most class hours was in military history, hence my interest in strategy. Genealogical research has a lot in common with a zero-sum game or the minimax theorem. My strategy in genealogical research is to minimize my losses, i.e. the time I spend looking for ancestors without finding any, while at the same time maximizing my gains, that is, the time I spend finding additional ancestors. I always assume the probability of finding any additional ancestor to be 100%. I also evaluate the possibility of finding a specific ancestor as being very low, with the exact probability dependent on the genealogical time/record decay curve. This is a steady sloped curve starting at 100% and ending close to zero over a 550 year time span starting with the present and ending sometime in the 1500s.

Based upon this type of evaluation, I spend my time focused on those family lines that have the greatest number of records available and also, according to my evaluation, the greatest probability that I will find more related ancestors. The main difference between genealogy and military strategy (there are many differences) is that, in a sense, I do not have an opponent. But from my perspective, I do have an opponent; the mass of historical data I must become acquainted with and sift through. My opponent is also time. As I get older, my ability to spend the time necessary decreases with the loss of physical abilities. So I am fighting a game against time and records. This is one reason for my immersion in technology. I see technology as a way to level the playing field. I can use computers and my ability to search for records to leverage more time out of the system.

It is inevitable. I will find more ancestors. Will I solve all of the issues left and questions that need answers? No. The probability of me answering all of the questions given my finite lifespan is close to zero, if not zero. Because I have doing research for such a long time, I have thousands of ancestral lines to choose from. So the probability of adding at least one more name, previously unknown, is virtually 100%. In addition, I can redefine my goals. This means that rather than concentrate solely on my direct line ancestors, I can work on what would be called collateral lines and their descendants. If the idea is to increase the number of documented relationships, then any unknown relative, no matter how related, becomes an objective.

How long will that take? I would guess that I can add one more name in a relatively short time of doing research depending on my choice of ancestral lines. But presently, I have a more fundamental challenge. Much of the information I presently have in my possession is not organized and is the result of collecting information for over 30 years. I can make a choice. I can try to extend existing lines or I can try to document and cite the information already in my family tree. In other words, have I extended my supply lines to the point where I may lose the entire battle? Do I need to consolidate? I have chosen to document my tree rather than spend time adding more names in the most recent past. But I am now in a position to begin to move both back in time and sideways in time and add additional families. As I continue my documentation, I will find additional family members. That again has a 100% probability.

What are the factors that determine an increased probability of finding ancestors and other relatives? First of all, you must be able to move past the initial state of finding relatives. If you are an orphan or foundling and do not know your parents, your ability to move beyond the starting state has a very low probability. However, with the addition of each relative or ancestor, the probability of adding more relatives increases. When the number reaches a certain point, i.e. when you have a sufficient base of ancestors and other relatives, the probability of adding one more individual approaches and eventually reaches 100%. At this point, the main limitation is the previously mentioned time/record decay curve.


1 comment:

  1. Probability of precipitation is defined as being for more than a defined amount. In Canada it's "The chance that measurable precipitation (0.2 mm of rain or 0.2 cm of snow) will fall on any random point of the forecast region during the forecast period."

    ReplyDelete