Some people eat, sleep and chew gum, I do genealogy and write...

Monday, September 28, 2015

Dealing with VLPs (Very Large Pedigrees)

There is a tipping point in genealogical research that occurs when the individual records in your personal database hit a certain number. This number varies from person to person, but is usually around 5,000 or so. I first encountered the very large pedigree issue when I inherited over 30 years worth of records from my great-grandmother. My first reaction was, who are these people? Today, with an additional 20 years of accumulation and having gone into hyper-speed with online family trees, I am still asking the same question.

Roughly speaking I see the following transition points as I work with genealogists around the world.
  • Very Small Pedigree -- 0 to 100 names
  • Workable Small Pedigree -- 100 to 1000 names
  • Developing Large Pedigree -- 1000 to 5000 names 
  • Very Large Pedigree -- 5000 names or more
  • Extremely Large Pedigree -- Any pedigree over 10,000 names
I have seen files that had well over 100,000 names. At this level, the file reaches critical mass and begins an implosion. Usually caused by the natural process of pedigree collapse, the number of duplicate and unrelated entries outnumber the actual number of valid relationships. The data problems and inconsistencies begin to move like viruses through the data. 

The fact is anyone can create a file with over 100,000 names in matter of hours of consistent work online. In my case, I could do this easily by using programs I have on my computer for downloading generations of names from the Family Tree. This can be done by disregarding accuracy and any demonstrable relationship. For the life of me I cannot understand why I would want to do this.

One of the first questions I always get when I mention genealogy is "How far back have you gone with your genealogy?" The second question is "How many names to you have in your file?" These are asked as if genealogy were some sort of competition. As a matter of fact, I don't know the answer to either one of those questions and I am not going to take the time to find out. A much more appropriate question should be "How many individuals have you sourced and verified?" But even that question seems to beg the point of what I do anyway. 

Going back to my great-grandmother and her genealogical efforts, after thirty years of accumulation she had recorded most of her lines three separate times. That much paper made it impossible for her to see that she had done the research previously. Duplication of effort became her biggest problem and it is likely that she didn't even know this had happened. 

One of my persistent themes in teaching classes is that the researchers should verify every link in every family line. Online family trees make this possible, but the common practice of ignoring sources suggested by these programs makes the activity difficult. If I go to the Family Tree, an accumulation of all of my family's genealogy for the past 100 years or so, I have yet to find a line that does not end with some ridiculous and unsupported factual assertion with no sources. My Tanner line ends with Francis Tanner, b. 1708, d. 1777. After that, the information is garbled and lacking in sources. Some lines go back further, but inevitably they end. Interestingly, in the Family Tree, all of these lines continue back into the dim past. That same Tanner line continues back to a Matthew Tanner, b. abt 1510 in Wiltshire, England and d. 1565. The interesting fact in this supposed line is that the line goes from a William Tanner, b. abt 1608 in Kent, England to Wiltshire, without any supporting connection.

Oh well, that is a constant background to all I do. But the point here is that size really does matter. The question is, how do we deal with our data when we hit the Developing Large Pedigree stage or if we inherit a VLP?

This question is the same as the one that asks how we eat an elephant (not that anyone is going to try this anytime soon). The answer is one bite at a time. We need to have the intestinal fortitude to "prune the tree." Cut off those parts that really don't have any demonstrable relationship and focus on the real issue of sourcing the information we already have. New individuals will be a natural result of careful, systematic research. I would so much more appreciate some researchers who verified the Tanner line on the Family Tree with valid information and forgot about adding more names from English Parish Registers gathered willy-nilly from different parishes. 

I will likely come back to this subject as I have in the past quite a few times. 


  1. My solution to my 5,000 name database in RootsMagic is to create groups. I like to find old family group pictures, identify the people and then create a group to work on. I find this both rewarding and it helps me focus on a small part of my database. It is especially neat when I have the picture hanging over my computer desk.

  2. This goes to the heart of one of the major problems with Familysearch: junk being added to the tree by the ignorant, the lazy or the just plain stupid. Ignorance can be cured of course with proper education of those users. Laziness or stupidity are much harder to counter.

    I don't know how extensive the problems are with families in later periods, but I do believe that there is a strong case for restricting the ability of people to both edit and create those in the Familysearch tree with dates earlier than 1800 until after they have proven themselves. This ties in with the persons of unusual size problem as well since until we stop the addition of nonsense we will never be able to get rid of it in the tree.

    My own research has taken me to a tree of about 3500 individuals. I recognise the issues raised about tree size, and if I didn't have a decent Genealogy database program I think that I would be in serious danger of repeating research.

    At least with an offline tree problems with incoherence and nonsense only affect one person's research. Nevertheless the problems caused by incoherence and nonsense in trees at places like Ancestry are just as bad as with the Familysearch tree, and in fact possibly worse. The reason that they might be possibly worse is that there is little to no prospect of ever being able to get rid of the nonsense in those trees, unlike the Familysearch one where central control at least has that prospect as a serious possibility. I try to properly source all of my tree, but there is one bit of it where I fear I have not been quite so diligent in doing that. When I do properly source it the conclusions do tend to agree with the existing structure, but it's still a serious problem.

    1. Interesting comments. I am not so sure that I agree with you that there are very many ignorant, lazy or stupid people working on the Family Tree. My experience is that there are some, but most of the people are well meaning but just not experienced researchers.

  3. Not what I was implying. The thing is that it doesn't take many people adding junk to trees for a lot of junk to accumulate. Look at the problem of inveterate name collectors who are careless in their research. Since they aim to get as many names in as possible two or three of them can quickly produce a great deal of nonsense.

    As for the well-meaning but just not experienced researchers, I would put them in the ignorant category. They are the people who can be taught how to do things properly. That's why I think that they should initially be restricted from editing earlier parts of the tree. Bad edits in that part of the tree affect more people as those further back in history tend to have more descendants. Also since sources get more sparse and less reliable the further back you go doing correct research gets harder the further back you go as well.

    1. Who decides which contributors are ignorant? :-)

  4. Familysearch. They run the place so they make the decision. Also it needs to be borne in mind that we need to think in terms of statistics to address this problem for Familysearch. Yes there will be people who are new to Familysearch who are fantastic researchers and always source things, but they will be a tiny, vanishingly small minority of new users of the service.

    So if the majority of people registering for Familysearch are likely to be ignorant of proper research techniques it makes a great deal of sense to restrict the changes those people can make to the older bits of the family tree until they have demonstrated that they are competent. Changes to that part of the tree will affect more people's lines and doing correct research for that period is inherently more difficult.

    How they can demonstrate that competence is a more interesting question. A certain number of edits made before they are let loose should certainly be one criterion. Perhaps also a test on some of the more advanced features of Familysearch. If resources allow getting their first edits of older tree members automatically referred for manual review would also probably be helpful.

    The first criterion weeds out trolls who will hopefully get bored before getting to the test. The test helps to showcase whether people know about features that are useful for more advanced editing, and again it helps to weed out more trolls. The final criterion is a last backstop against trolls and also those who are still making dodgy changes can be spotted and weeded out of access to older tree member editing. Those who get picked up at manual review would be offered further tutorials on how to do proper editing. The catch with the last step is, of course, resources to do the manual reviews.

    Beyond access to the older people being gated somewhat I am of the opinion that certain people should be locked down such that any changes to them need to be approved by Familysearch personnel. The sort of people I mean with that are Mayflower passengers, early royalty and such like. In other words people for whom extensive research has already been done and the editing of whose profiles will affect the ancestries of tens of thousands of users. Locking them down will also allow the persons of unusual size problem to begin to be tackled as a lot of those persons fall into the lockdown category.