Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, May 8, 2013

Thoughts on Very Large Genealogy Files (VLGF)

There seems to be a large measure of pride in having a very large genealogy file (VLGF), just as if there were a prize for the largest file. I routinely talk to people who claim to have thousands upon thousands of names in their files. Recently, one lady mentioned having 30,000 names. Think about this for a minute. If you spent only 1 minute looking at each name, that would be 500 hours of looking; a forty-hour week of full time employment for twelve and half weeks. Just spending one minute on each name without a break. I have never looked at one of these VLGFs that did not have significant problems with very obvious mistakes and inconsistencies.

There are those that accumulate a large number of names, usually through a form of record extraction, where they are following certain surnames or families in a small geographic area and simply collecting the names from parish records or something similar. The goal here is not to understand the people, document their lives or provide accurate sources, but merely to gather the names from the records. There may be valid reasons for doing this type of activity and, in fact, name extraction, as it is called, is used by anyone making an index of the people living in a certain area. But those who have huge files are not always involved in private name extraction. Today, accumulating a huge genealogy file is a whole lot easier.

All you have to do to accumulate thousands of names is copy the work already done by others. You don't have to worry about whether or not it is correct, because, of course it is correct or the person would not have put their file online. In fact, as genealogists, we are encouraged to do a "survey" of what others have previously accumulated in our family in order to avoid duplication. We dutifully copy all their work and thereby increase the duplication. In addition, we find that we have common ancestors. Many online genealogical databases, automatically inform you of connections (i.e. duplicate records) in others' family trees. Then, of course, if they happen to have more names than you have, you can just copy what they have and increase the size of your own file!

We are treating our genealogy files and ancestors as if they were friends on Facebook or contacts on LinkedIn. Rather than focus on the people, as individuals, they have become like baseball cards; names to be traded and we become upset because we are missing a name and thereby missing the "full set" of cards.

I just spoke to a woman who had published a surname book of her findings and her statement to me was, "I didn't include any source material, because that would have been too much work." So what is the point? Why are we accumulating the data without attribution or sources? We enjoy the lovely and sometimes inspirational stories about our ancestors and repeat them, with embellishments, without providing even a modicum of source material. How is that genealogy? Isn't it really historical fiction?

Some of the online family tree providers advertise the large number of "names" in their files, without regard for the number of duplicates. For example, presently, claims 2,147,463,647 records in their Public Member Trees. How many of those are duplicates? In fact, has made a significant effort to show you all the duplicates. If you put your own file online, will give you green leaves, one for each copy of a name in your file it finds duplicated on someone else's tree.

There are those who argue that this accumulation of huge files and duplicates is a benefit, because, who knows? Somewhere in that pile there may be hint or something helping me to find more ancestors. This would be true if we were all doing our own work and trying to solve genealogical issues, but the multiple copies make it virtually impossible to distinguish between the wheat and the chaff. In our case today, it is mostly all chaff and what little wheat there is is usually useless.

Maybe we need to begin celebrating finding a source, as much or more than we do when we find a name? Maybe we need to begin asking those with large files to identify their sources? I have been working for over a year now, putting sources into's Family Tree program. Interestingly, over the same time period, unless I go back many generations, I find no one else who has added so much as a reference to anyone in my set of ancestors, except those being worked on by people associated with FamilySearch itself. Aren't we just running around in circles without sources?


  1. I just have a LGF in FTM and hear you on all of this. I recently started updating information in FamilySearch and trying to tie my family together with existing information. I found it all fairly easy. I tried the source thing and found it difficult, but today decided to give it another try. It is still difficult. I guess I am spoiled with where you find the source and it adds it. Is there an easier way? I have been finding the source, adding to my sources and then attaching it. I see I can have multiple windows open, and although easier, still seems very timeconsuming.

    624 221 878 entries in the Family Trees
    420 888 376 entries in the Library
    93 208 785 entries in the Archive & Indexes
    665 380 entries in the Pictures
    1 639 713 entries in other documents

  3. Great Article. I am one of those that have all my sources in my paper files but not in my Legacy. I am going to re-think this. Thanks for the article

  4. I agree with you on the futility of very large files without sources. I think that Family Tree on Family Search has the potential to make adding sources a really desirable thing. I like being able to add direct links to Census records and easily create custom source records. Similar to what you suggest, I now find myself looking at my ancestors to see which ones have the most sources and the most detail. Wikitree is similar (except that it is cumbersome with respect to citing Census records).

  5. Great post, really great. Thanks