Some people eat, sleep and chew gum, I do genealogy and write...

Friday, June 13, 2014

The Thimble Game Search

Movie depictions of the thimble game (aka the thimblerig) commonly depicted as happening in the 1800s wild west. The trick, of course, is that the pea is not under any of the thimbles. It is what is known as a non-zero sum game. In fact, it is the exact opposite of a zero sum game. I am very used to solving zero-sum games with the minimax theorem. The problem with any non-zero sum game is that winning depends on your ability to accurately determine the odds. Gambling has never had the slightest attraction to me for the simple reason that it is a non- zero sum game where the odds are easily determined to be in favor of the house. Thanks to John von Neumann, I have never had the slightest interest in gambling.

Now, we are back to the common question with my blog posts, what can this conceivably have to do with genealogy?

Searching for sources to support an ancestral line is a non-zero sum game. In fact, in almost all cases the odds of finding a source are heavily in favor of the "house" or in this case, the researcher. The best strategy here is maximin, that is one that maximizes one's own minimum payoff. To be more exact, maximin is recursive and applies a heuristic evaluation function. Here is a brief explanation of the function from Wikipedia that will illustrate how this applies to genealogy:
The algorithm can be thought of as exploring the nodes of a game tree. The effective branching factor of the tree is the average number of children of each node (i.e., the average number of legal moves in a position). The number of nodes to be explored usually increases exponentially with the number of plies (it is less than exponential if evaluating forced moves or repeated positions). The number of nodes to be explored for the analysis of a game is therefore approximately the branching factor raised to the power of the number of plies
The analogy to a family tree is obvious. What may not be so obvious is the underlying strategy. Most genealogical researchers view their investigation as focusing on a single individual at a time. Therefore, the odds against finding the exact information sought are highly against the researcher. The proper strategy in conducting genealogical research is to view the "family tree" as a set of legal moves that increase with the number of plies or generations. In short, the analysis of the family tree involves exploring all of the possible nodes, that is all of the individuals in the tree, with equal attention. Of course, the number of nodes, as noted in the quote, expand exponentially. So from a practical standpoint, there is a limit to the analysis. But the procedure is  always the same, the only limitation being the ability of the researcher to consider multiple nodes at the same time, in a manner similar to a chess master.

By doing this, the sequence of inquiry "floats" to make allowances for the sources discovered. This is a non-zero sum game as long as the researcher does not become enamored with seeking a single node or individual in the family tree.

The practical application of this is rather simply explained. In doing genealogical research, focusing on a singe individual or single event is self-defeating. There is always a measurable probability that the exact information you are seeking is not available. However, if you consider the family, even in generations as a whole, you will dramatically increase the probability that the information you are seeking will be found. Some search engines have begun to exploit this principle, in part, by searching multiple generations of families at the same time to determine the applicability of any specific source. This is exactly why many of the models of game theory analysis take the form of pedigree-like family trees. If we view the probability of finding information, any information, as a function of the number of nodes (individuals in the family tree) we examine and evaluate, then we will begin to see that the chances of finding that individual increase with the depth of the search. We need to emulate the wide search engine strategy of using all of the information already obtained about the entire family to suggest the direction of the research, rather than establishing "goals" for our research that will be frustrated because the exact information is unavailable.

How do I translate this into a hypothetical example? Like this. Suppose you are "looking for the birth date of your great-grandfather." You have just placed the search into a win/lose situation. If you find the birth date, you win. If you do not find the birth date, you lose. On the other hand, if you were to follow my analysis, you would be looking for information about the entire "family" or nodes on the tree. You may incidentally find the information about the birth of the great-grandfather or not, but the chances of finding that piece of information and many, many others that may obviate the need to have an exact birth date, such as a statement giving the age of the individual at the time of death, that can come from almost any one in the entire family.

So now you see that this really does have a lot to do with genealogy. Looking for a narrow, specified sub-set of information lowers the probability that any useful information will be found. Unfortunately, most researchers are caught up in research logs and formulating strategies for finding a specific individual rather that looking at the largest number of nodes possible at any given moment and modify the search as information is obtained. Putting genealogy into a win/lose type of game is too much like gambling for me.

1 comment:

  1. HI James
    Aren't you saying that if I want to find info on Joseph Edgar Leslie Trotter, I should start with Trotter and refine from there. Which is a fundamental of Google searching anyway.