RootsTech 2014


Some people eat, sleep and chew gum, I do genealogy and write...

Tuesday, November 1, 2011

Thoughts on really, really big genealogy files

Initial disclaimer: If you are struggling to find your grandparents or great-grandparents, you may become terminally discouraged reading this post. It is a fact that some genealogists inherit huge files from their relatives. Some have even done all the work themselves. When I say large files, I mean more than 15,000 individuals and more in the neighborhood of 20,000 or more. I realize that some people cannot imagine being related to that many people. Personally, I am convinced that when you get over, say, 10,000 you are probably not related to all the people in your database. But since I fall into the higher end of that category, I thought it necessary to address some of the issues of the larger files.

As a side note, I think I will scream the next time someone asks me "How many names do you have in your file?" Genealogy is not a competition sport. I happen to come from several very large and genealogically active families. You may not. Don't compare totals, it is not productive.

There is really no practical upper limit to how many people you can put into one file. Do the math, you have thousands and thousands of relatives. Just because you aren't acquainted with all those thousands does not mean that they don't exist. Most people focus on their surname line to the exclusion of many collateral lines. For example, you may know your mother's family, but have never met your father's. Another issue is that by and large the children and their spouses of remote ancestors are ignored.  I still have many lines that show only one child in a family. This is not only unlikely but misleading. Even if you claim to be from a long line of families with only one child, that is impossible to be the case with each of your ancestors. If all of your ancestors only had one child, you would not be alive today.

Some people find a surname book written by their ancestors and think that all their genealogy is "done." Would you like to bet? Anyway, one book does not a genealogy make, as they say. I happen to have about six or eight of these books and will probably die before I get all of the names verified and transcribed into my own files. By the way, you could put all of the documentation from all eight books on a single sheet of paper and have room left over.

Back to the large file folks. Here are some admissions, which I am assuming are pretty generally applicable to almost everyone with the same type of huge genealogy file. To start out, most of the names have been "copied," that means they were gathered by someone else. Either the names came from existing files or are being transcribed from online or other published sources. I commonly find people claiming to have thousands of names have copied the inhabitants of whole towns and villages. In my case, my Great-grandmother practiced the name extraction method of relationship. She would copy down any person in a certain geographic area with the same surname. I know for a positive fact that these people are not my relatives, but I can't prove that they are and I can't prove that they are not. Since I don't have the luxury of talking to my Great-grandmother, they now reside in my files indefinitely.

One rule that is almost certain is the larger the file, the less proportionate amount of verification of the individuals. I have thousands of names in my file acquired from years of finding family group records and looking at other files, that have not been verified with any sources even though I have spent thirty years doing genealogy. At latest count I have over 70,000 documents, but if I had documentation for 20,000 people, the number would be astronomically larger.

There is a subset of people with huge files. These are the people who have nothing more than a computer file with names. I meet people all the time who have received a file from their relatives with tens of thousands of names and they couldn't tell you their own grandparents' names from memory. Fortunately, I am not in that category but move a few dozen generations back in the past and I am pretty vague about my ancestors without looking at the file. I could belong to dozens of heritage organizations, Suns of the Utah Pioneers, Civil War organizations, Revolutionary War organizations, all sorts of stuff. When you get right down to it, having that many names is an oppressive burden if you are obsessive about documentation.

So what do I do about this huge pile of names. Tune in again for another installment of how to deal with large numbers. On the other hand, if you only have a few names and are struggling to find your family at all, feel lucky, you don't have to worry about what to do next.


  1. I was one of the 'lucky' ones who inherited a great deal of family genealogy. My Grandmother, her sister, her aunt, and her aunts' daughter, were all avid genealogists. I wound up with all their work, some 34,000+ people. I've spent the last 4 years trying to actually add documentation to all this. Notes such as "Georgia Trip", "NARA", "Mississippi Vacation', etc. don't make it much easier than if they had no notation at all. ;)

  2. I have a file with over 75,000. But its not worth getting jealous because it is a file with all the Mayflower families I'm interconnected with genealogically. This file grows and grows because the more I look, the more people I find who are interconnected with these families. The information is all from the Mayflower Society, at least the first six or seven generations, the rest is my own research. I also have done several surname studies, all part of this same database. These are all the descendants of five or six immigrants in the Great Migration (not Mayflower families, but almost as early). These studies tend to have many, many people involved, and as I work on the files I drop some branches and add others. They are all in one big file because of all the intermarriages with the first bunch of Mayflower passengers. Just like you said, heritage societies and lineage organizations are one reason for big files, but if your family is part of one of these spider webs it is fun to merge files and see what happens. Every time we have a new wedding in my family in New England, I often only have to research a few generations before I can connect the new in-law to this big database. I also maintain two databases of just my own lineage, with allied lines limited to three or four generations, but due to intermarriages I again find that just one more generation will form another interconnection. The downside to these large files are slow connections, files too big to store on CDs for backup, and too many names to search through for finding names. I have had to upgrade my computer twice in two years just to handle the larger files, at much expense, especially for storing them in the cloud.

  3. I think I belong to both groups, I began with a dubious printed family book (The Stricklers of Pennsylvania) and soon acquired some names from generous, previously unknown, cousins. Little or no documentation on any of them. My numbers aren't as large as yours, but I find 3377 mostly unverified names daunting.
    On the other-hand these have always seemed to me to be "jumping-off points." I followed the general injunction to "begin with yourself and search backwards." I had to skip the "interview older members" portion of that advice, because when I started my searching my husband and I WERE the oldest members of the families — and we knew nothing.
    So I started with myself and am working backwards. At this point I "nearly" have GPS for ONE person — Me. But the journey is fun. I'm looking forward for your next thoughts on all those undocumented names in my files.
    And, by the way, I just added a 740 page ebook (more undocumented names!) that lists one of my lines and one collateral line and connects them. Another jumping off place. The author states up front that there is no documentation and why not. It has some interesting ways of mingling history (world, national, and local as appropriate) with the descendant charts. I think it well worth the e-book price, but it adds more names and not more knowledge.

  4. Thanks for another great posting about a common problem. I try to keep my genealogy lines separate at the 4 grandparent level, which helps to keep the numbers down somewhat - really do I need to have them all entwined in one massive tree? I decided I don't. The sources and citations are ongoing research tasks - leading to a few Eureka moments now and then, scaring others around me. Cheers.