Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, February 25, 2016

Genealogy and Complexity

There are two major types of complexity: complexity of numbers and complexity of concepts. There are also different areas of study that are labeled with the term "complexity." One common usage of the term "complexity theory" is used in computer science in the area of computational complexity theory that focuses on defining the classes of complex computational problems and developing organizational solutions through assigning problem solving algorithms. A type of complexity theory is also used in organizations and is also called complexity strategy and is used in strategic management and organizational studies.

Genealogical research is a complex activity on both levels of complexity. On one hand, genealogists are faced with the reality of an exponential growth in the number of their potential ancestors. Although the reality is that due to intermarriage among any person's ancestors (pedigree collapse) that the theoretical number is never actually reached, the number of ancestors and their potential descendants can be very large and much larger than any one person can comprehend. At the other end of the complexity spectrum, genealogists also face complexity in the number and types of records used to research relationships. This is not just a complexity of the number of records, but also the difficulty in obtaining, reading, evaluating and extracting information from records that are almost chaotic in nature.

Historically, very few genealogical researchers reached the level of involvement with their research to become embroiled in complexity issues. However using today's online tools coupled with the vast increase in the availability of digitized records, a genealogist can easily be catapulted into a complexity unimaginable just a few year ago. Traditional genealogical organizational systems based on color coding, filing systems, notebooks and research logs fail to address the basic issues and in most cases, simply add more complexity. In my experience, most researchers avoid the problem of confronting the overall complexity of genealogical research by ignoring the problem and focusing on a tiny part of their pedigree. You will hear comments such as "I am working on such and such a line" that indicate that the researcher is avoiding the complexity issues.

Even if a researcher focuses on a single line, the complexity is still there. Every generation on that line is also subject to the ever present geometric increase complicated by the natural decline in the availability of records as we move further into the past. Ultimately, the researcher is faced with the inevitable "end of line" or so-called "brick wall" problems when the difficulty in finding records exceeds the researcher's resources. All the time this is happening today, the researcher is being bombarded with messages from various sources extolling the virtues of this or that software product or system that will "solve' the problems and make them go away. This is partially supposed to be solved by collaborating with other members of the researcher's family who are, for the most part, entirely clueless about the problems faced by the researcher and usually uninteresting solving them.

Let me give one example at this point. Many of the newer strategies from the large, online genealogically database programs involve automated systems of supplying source records suggested by complex algorithms. These records are directed at the computational complexity of handling billions of records and billions of people in the largest of these online resources. The effect of these programs is to move more people faster into the realm of complexity. Instead of starting with a few manageable ancestors, the neophyte is faced with hundreds of potential ancestors long before they have the mental tools to handle the complexity both of record evaluation and sheer numbers. At the same the time they are being told that genealogical research or even genealogy is no longer necessary. The reality of the online record hinting programs is that they break down in exactly the same places that the competent researchers have reached in their approaches to the problems. I do not find "record hints" for those people I am really most interested in researching.

The main advantage of the record hinting technologies is to add more names or complexity to the more recent ancestral lines. For example, I have literally thousands of potential record hints from several large online database companies. If I focus on one of my "end of line" situations developed over years of research, I almost immediately find that there are no more record hints for the people I am researching. The algorithms have no more resources than those I have already been able to research. The computers cannot go out into the world and find records that their owners have not already incorporated into their databases. If the records are not there, they are not there and all the computer power in the world cannot solve the problem and I am back to my traditional methods of doing research.

In other words, the complexity of the ultimate genealogical research problems still exceed any existing algorithmic solutions. What the programs do is to increase the numerical complexity faced by the researcher by automatically adding records that the researcher would have traditionally ignored or avoided.

Meanwhile to add to the complexity, we have an additional wave of people who are not really interested in or educated in genealogy per se, but now have access to tools that supply a seeming endless number of "sources" and "names" to add to the monumentally large online family tree structures. We also have an emphasis on quasi-scientific "advances" such a DNA research that add yet another level of complexity by indicating previously unknown potential relationships. Those who request DNA tests do so, in almost all cases, without any of the research tools they need to pursue the information they receive from the tests.

What is needed? First and foremost, those who are attempting to resolve this complexity should try to avoid marginalizing and preventing the genealogists who have the education and sophistication from doing their research. There is nothing wrong at all from encouraging someone without any involvement in "family history" from beginning to accumulate a pedigree but this should not be done by denigrating the efforts of those who created the the original organizational structure. The ease at which most people can discover their first few generations of ancestry should not be held as the norm because of the complexity that this initial research generates as those same people attempt to move back in each succeeding generation.

Genealogy has moved into the mainstream of those major areas that confront different levels of complexity. The noise level in the entire genealogical system is rising to almost impossible levels. To paraphrase John Donne, no one individual is an island in genealogy. We are all ultimately related and this level of complexity cannot be ignored with impunity.

If genealogy were being analyzed by systems engineers in same way any other data transmission system was being analyzed, then the engineers would be seeing an unacceptable level of noise interfering with the operation of the system. Right now, the record hints, emphasis on the ease of adding sources and records, the DNA tests, the invitation to neophytes to work on complex problems without understanding the issues and many other factors have raised the noise level of the system to the point where is is nearing a breakdown. We need to address the organization of the entire system from a complexity standpoint and stop treating it as if we were all working on separate, independent simple, easy and fun issues.

[Closing note: I do not know why I try to address these issues when nobody seems to be interested in pursuing the real issues and addressing potential solutions. Anybody out there?]


  1. Well, i'm glad you ponder these issues - someone has to. Very few of us do.

    I think that the record hints have reduced complexity for many searchers - "lookee here, they found my John Smith." I'll add that to his profile. The problem is, of course, is it the right John Smith?

    It seems to me that they have made it easier and faster to get back to the "brick wall" and "end of line" ancestors that nobody can find in records without a lot of research being done. Your point is that since it's easier and faster, the searcher doesn't learn the guiding principles of genealogy research. In the end, a certain percentage of searchers continue to the next level and become competent researchers. It's always been that way, but now we separate the wheat from the chaff must faster.

    1. I have always wondered if I am wheat or chaff :-)

  2. James, I love your image of computers "going out into the world." In a way, the bearers of portable devices do this, at least bearing simplified programs into brick-and-mortar repositories. FS in a way is doing this in the effort to digitize selected records databases, but I do not have a sense of where the emphasis is. Since is concentrating on 20th-century databases (4 generations back from most the currently living), I doubt that this effort will bear fruit for my own research needs.

    The aggregators are finding it most easy to handle what already possesses of GSU microfilms and what FS has already indexed. Their overlap is tremendous, and few are venturing up into the hills for the stray sheep existing in the aforesaid records repositories (in the US, state archives, historical societies, libraries' manuscript collections, etc.).

    The aggregators' folly is relying on the GSU/FHL holdings, which is rather biased away from complexity and toward the quickie databases that supply readily indexed and fairly recent vital data in the simplest form (the current effort to index late 20th-century obituaries comes to mind). Compared with the commercial power of the aggregators, with few exceptions the excellent researchers-teachers (and yes, some bloggers such as thou) are rather left in the dust -- seeking complexity in research approaches as well as in data-handling methods, rather than simple-quick answers to the simplest genealogical questions. Not that the simplest questions always find quick answers, as descendants of rather recent immigrants often find out.

    I am not working on how to integrate complexity parameters with the mainstream of genealogical efforts. FHISO ( seemed promising, but may be stalled out: in the myriad reports from RootsTech2016 I saw not a single mention of that organization, and very few questioning the family-story orientation of the event. You have touched on the latter point, and much has been said. Much more discussion is yet to be had.

    1. Thanks for you comments as always. Some day there may be a dialogue on these issues.

    2. +Geolover, it is true that FHISO did not have an exhibit at RootsTech this year -- unlike previous years -- but they are definitely not gone, and there were actually four Board members present there. There have been some team changes recently, and when they have addressed some very important administration paperwork then the organisational structure will be resized to something more appropriate to the number of potential volunteers, and work will resume in earnest. There are some updates on the web site, with some more important ones due very soon.

  3. I'm here, James ;-)

    While it is true that the number of potential ancestors increases geometrically (an "exponential" increase is technically different but I don't want to get pedantic), it is seriously moderated by the availability of relevant information for the older generations. I personally believe that the majority of researchers do not investigate the available information to any significant extent, with many being content with mere trees. But then does a wider focus on family history, or even micro-history, really add much to the overall complexity?

    In software, there is something called 'computational complexity theory' that looks at the time and space (memory or disk) required to solve a given problem. If you asked genealogical developers how this relates to genealogy then I would wager that most would cite the problem of merging persons (or "personas") across different sources, including GEDCOM contributions -- hardly representative of genealogy in my opinion. However, it is possible for family history contribution from a given researcher (compiled offline) to be uploaded to some global tree (or other globally indexed system) in "user time", meaning during their interaction with the system rather than overnight, or bi-monthly,. in some massively CPU-intensive meeting of a technological challenge. IOW, maybe much of the supposed complexity is a myth.