Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, September 4, 2014

The Limits of Online Genealogical Research

I frequently hear comments from genealogists about the limits of online searching. It seems axiomatic that "all of the world's records have yet to be digitized." If all of the records had been digitized, then why would we keep hearing about huge blocks of records being added almost daily to the online collections? But the number of online entities adding digital records every day is truly staggering. It seems to me that there is a double perception here: on one hand we see million upon millions of records going online while at the same time we tend to perceive that the amount of records left to digitize does not diminish. In fact, it would seem that the number of records left to digitize, based on the perception of the genealogists, seems to be growing rather than decreasing.

Is there any way to get a feeling for how many records are left out there and what has already been made available online? I think that the question is made to be more complicated than it really is. It is too easy to point at a specific repository of paper records and say something like, well, those records haven't been digitized and therefore, there are huge numbers of records left un-digitized. It is also easy to argue that the "undeveloped" countries have limited or non-existing digitization projects. But I would like to point out a different view of the subject altogether.

First, Western European based genealogists are primarily talking about records in the United States and other countries settled primarily by Europeans. Secondly, they are also talking about records that have value to genealogical research, not just piles of paper. When was the last time you saw a report in the genealogical blogging community about an international conference on Chinese genealogy? How much do you know about digitizing efforts in China and Japan? What do you know about digitizing efforts in India? Are there any Russian genealogical records being digitized? If you know the answer to these questions, then you might have a valid opinion about the status of worldwide digitization projects. Otherwise, you are probably looking at a relatively small segment of the entire world's records.

But what if we concentrate on one country, say, the United States. What is the status of the effort to digitize records in the United States? How many genealogically significant records are there and how many of those records have been digitized? Of course on problem with asking such a question involves defining what is and what is not "genealogically significant." If you take a very expansive view of genealogy, almost any scrap of paper in existence in the world is genealogically significant. But that is probably an unrealistic viewpoint.

One way to begin to quantify this issue and not get bogged down in vague and unsupported generalities, is to look at specific types of records and see where we are with digitization efforts. For example, what about land and property records in the United States? By focusing on one type of record, you can get a good feel for the number of records that are online and the number of records that are waiting to be digitized. What I see when I look at a particular type of record across the entire United States is that there is a marked disparity in the number of records available in digital format from jurisdiction to jurisdiction. What makes an assessment of land records difficult is that most of these records are maintained at a county level and whether or not online digital copies of those records are available varies from county to county.

So, after spending years now looking for collections of records online, what is my own personal perception? I keep bumping into the limits of online records. There are some notable examples of this limitation, such as the small percentage of U.S. records that have been digitized from the U.S. National Archives and the further limited digital holdings of the Library of Congress. As I see it, there is still an ocean of paper-based records out there and presently, the digitization efforts are only nibbling away at the edges. It is presently much easier to find out who has the records by searching online, but then it is still necessary to go to the repositories and actually do the research.

Will this change? Yes. But there is still a long, long way to go before we can say that even half of the world's records have been digitized.

Now, before you get despondent over this issue, you must realize that many of the records that genealogists have used extensively over the years, such as the census records, are available online. It is reasonably possible to build a four or even five generation pedigree, in some cases, entirely from online sources. But that is done be artificially ignoring entire classes of records.


  1. One of the biggest problems with doing all ones research online is that the online researcher fails to realise what records are actually archived that could help the research.

    It is all very well suggesting that many of the records genealogists have used extensively over the years are online, but does it help.
    Here in England we have access to census up to 1911 but we do not have online access to birth and death registers over 100 years old.
    In fact most websites suggest contacting the GRO for copies of birth and death entries but fail to point out the GRO only holds transcripts of such records and not originals.

    No I am afraid it will be a good few years before accurate family history research can be done online for researchers of those who lived in England & Wales

    1. I certainly agree. As time passes, more of these limitation on availability issues will be resolved, but some may never be.

  2. A summing of the US Census Population records from 1790 through 1940 available at FamilySearch gives 683,314,366 records. A separate summing of the US Census Population numbers from 1950 through 2010 at gives 1,598,370,520 records to be indexed and available in the future. Thus the total number of US Census Records from 1790 through 2010 is about 2.3 billion. Wow!

    1. Yes, wow and all of the U.S. Census records have been separately indexed many times. But the number of the rest of the records dwarves the number of census records.