Some people eat, sleep and chew gum, I do genealogy and write...

Tuesday, July 21, 2020

How Many Records in the World have been digitized? A Cautionary Tale

When I began digitizing records now quite a few years ago, I also began noticing the claims of the large online digital databases about the huge number of documents and records being digitized and included in their collections. As the numbers increased, it seemed that most of the records we needed to do genealogical research were now online. As the numbers of digitized records continue to increase into the billions upon billions, you might get the impression that we have made a significant dent in the world's paper records but it turns out that the whole process has just barely started. 

Why is this the case? One example was the year my wife and I spent digitizing records for FamilySearch at the Maryland State Archives in Annapolis, Maryland. Our team of four cameras likely digitized close to 20 million probate records. This may seem like a significant number, but the number left to digitize was many magnitudes larger. In fact, the number of digitized records from that one archive are still a very small percentage of the entire collection. 

While living in Annapolis, Maryland, we had a number of opportunities to visit both the Library of Congress and the National Archives. I had an opportunity to discuss the number of digitized records available from the Library of Congress with a knowledgeable staff person. Again, the percentage of records and books digitized is a vanishingly small percentage of the total number of documents. You can also see the number of digitized records on the National Archives and Records Administration (NARA or National Archives) website. You have to work through the difference between digital preservation and the digitizing of paper documents. To the National Archives, digital preservation means the preservation documents in digital format. 

No one really knows how many documents are stored by the National Archives but here is a relatively recent estimate from the article entitled, "About the National Archives of the United States."
NARA keeps only those Federal records that are judged to have continuing value—about 2 to 5 percent of those generated in any given year. By now, they add up to a formidable number, diverse in form as well as in content. There are approximately 10 billion pages of textual records; 12 million maps, charts, and architectural and engineering drawings; 25 million still photographs and graphics; 24 million aerial photographs; 300,000 reels of motion picture film; 400,000 video and sound recordings; and 133 terabytes of electronic data. All of these materials are preserved because they are important to the workings of Government, have long-term research worth, or provide information of value to citizens.
The percentage of textual records that have been digitized is vanishingly small compared to the number of records in the National Archives' collections, despite a number of Digitization Partnerships. Obviously, only some of all the documents in the National Archives are genealogically important or even interesting but there are huge collections that are interesting to genealogists that are still on paper. 

When I visit archives, libraries, and historical societies, I often try to determine how much of the collection is digitized and available online. I am usually disappointed to learn that some large, extremely valuable collections are "waiting" to be digitized due to budget constraints or other issues. 

In addition, just because documents are digitized, it does not mean that they are online and/or available to the public. Use restrictions and paywalls are common. For example, some of the "partners" listed by the U.S. National Archives are commercial websites with subscription requirements. If you start searching for digital records on the National Archives and Records Administration (NARA) website you will soon be linked out to another website. The number of documents actually available on the National Archives website is very small. 

The summary of all this is simple. Dedicated genealogists who are involved in extensive research will inevitably end up exhausting online resources and will likely have to travel to specific archives, libraries, historical societies, county courthouses, and many other locations to look at microfilm or paper records. They number of digitized records has decreased my personal need to travel but not eliminated it. 


  1. Thanks for writing this! It can't be said enough: it's not all on the internet.

  2. Agree, totally!
    So many people believe it’s all been ‘done’.
    Just like everyone that tells me they have their family history, all done by their Aunt/Grandfather/Mother.

  3. Sadly many people do believe that everything is available on the internet.

    If only that were true