Roughly speaking I see the following transition points as I work with genealogists around the world.
- Very Small Pedigree -- 0 to 100 names
- Workable Small Pedigree -- 100 to 1000 names
- Developing Large Pedigree -- 1000 to 5000 names
- Very Large Pedigree -- 5000 names or more
- Extremely Large Pedigree -- Any pedigree over 10,000 names
I have seen files that had well over 100,000 names. At this level, the file reaches critical mass and begins an implosion. Usually caused by the natural process of pedigree collapse, the number of duplicate and unrelated entries outnumber the actual number of valid relationships. The data problems and inconsistencies begin to move like viruses through the data.
The fact is anyone can create a file with over 100,000 names in matter of hours of consistent work online. In my case, I could do this easily by using programs I have on my computer for downloading generations of names from the FamilySearch.org Family Tree. This can be done by disregarding accuracy and any demonstrable relationship. For the life of me I cannot understand why I would want to do this.
One of the first questions I always get when I mention genealogy is "How far back have you gone with your genealogy?" The second question is "How many names to you have in your file?" These are asked as if genealogy were some sort of competition. As a matter of fact, I don't know the answer to either one of those questions and I am not going to take the time to find out. A much more appropriate question should be "How many individuals have you sourced and verified?" But even that question seems to beg the point of what I do anyway.
Going back to my great-grandmother and her genealogical efforts, after thirty years of accumulation she had recorded most of her lines three separate times. That much paper made it impossible for her to see that she had done the research previously. Duplication of effort became her biggest problem and it is likely that she didn't even know this had happened.
One of my persistent themes in teaching classes is that the researchers should verify every link in every family line. Online family trees make this possible, but the common practice of ignoring sources suggested by these programs makes the activity difficult. If I go to the FamilySearch.org Family Tree, an accumulation of all of my family's genealogy for the past 100 years or so, I have yet to find a line that does not end with some ridiculous and unsupported factual assertion with no sources. My Tanner line ends with Francis Tanner, b. 1708, d. 1777. After that, the information is garbled and lacking in sources. Some lines go back further, but inevitably they end. Interestingly, in the FamilySearch.org Family Tree, all of these lines continue back into the dim past. That same Tanner line continues back to a Matthew Tanner, b. abt 1510 in Wiltshire, England and d. 1565. The interesting fact in this supposed line is that the line goes from a William Tanner, b. abt 1608 in Kent, England to Wiltshire, without any supporting connection.
Oh well, that is a constant background to all I do. But the point here is that size really does matter. The question is, how do we deal with our data when we hit the Developing Large Pedigree stage or if we inherit a VLP?
This question is the same as the one that asks how we eat an elephant (not that anyone is going to try this anytime soon). The answer is one bite at a time. We need to have the intestinal fortitude to "prune the tree." Cut off those parts that really don't have any demonstrable relationship and focus on the real issue of sourcing the information we already have. New individuals will be a natural result of careful, systematic research. I would so much more appreciate some researchers who verified the Tanner line on the Family Tree with valid information and forgot about adding more names from English Parish Registers gathered willy-nilly from different parishes.
I will likely come back to this subject as I have in the past quite a few times.