Some people eat, sleep and chew gum, I do genealogy and write...

Saturday, January 21, 2012

What is left to digitize?

Now, if you take this question seriously you have a problem with perception. Although digitization projects capture a lot of news space, the records left undigitized in repositories around the world is still overwhelmingly large. Let me give an example. The State of Washington has an online digital archive with 104,978,112 records with 34,736,265 searchable. Compare that to most or nearly all of the other state's online archive collections. It is evident that we still have a really, really long way to go before even half of the records in the world are digitized. Another example is do a search in for some book or record and see if it has been digitized and is available in an ebook format. You might be surprised either way, if you haven't been keeping up with digitizing, you might be surprised at the number of items available online. On the other hand, you can readily see that even commonly available books have yet to be entirely digitized.

The first lesson from this very cursory review of the state of digitization is that as genealogists we cannot expect to find all of the information we need for our family research online. The next lesson is one of patience. Many records now locked up in repositories, will be liberated online, but don't hold your breath.

What are the largest repositories of online digital records? This is a very difficult question to answer since many of the larger online services do not provide information about the number of documents or files they have in their databases and even if some figures are available, the figures are often ambiguous because they refer to individuals, or records, or files, or documents, or whatever and there is no way to compare the figures from one repository to another. Here is a link to one interesting article called, "How Much Information Is There In the World? by Michael Lesk.

It is likely that, with its millions of records going online every day, day after day, is my guess for having most images of original source documents. certainly has a huge collection of resources, most of which are indexes without images of the original documents. There are large number of other large collections and the online content increases daily. 

But in searching online, I find some significant real world (paper) collections that have barely started to be digitized and are still mostly available only by a physical onsite visit:

Library of Congress (LOC)
Although the LOC has an active digitization program, they have a really long way to go before even a significant percentage of the collection is available in digitized format. The LOC has an extensive and rapidly growing online collection of digitized newspapers and many, many other documents, but the main book collection of the library has not yet been significantly digitized.

U.S. National Archives (NARA)
The U.S. National Archives has only the barest beginnings of an online digital collection. The NARA has been relying on third parties, such as the now defunct (now owned by as to digitize records. Here is a quote from the NARA website, "The National Archives web site has very few actual records online. The National Archives has a very large amount of records that are useful for genealogy, but most of these records are not online." (emphasis in original).

FamilySearch's Family History Library (FHL)
It may seem a contradiction to mention FamilySearch as having the largest collection of online source documents and at the same time list it as a site needing addition digitization. Although there is a huge ongoing effort to digitize the holdings of the FHL, actually only very small percentage of the records have currently made their way online. For example, the Historical Record Collections contain around a 1000 collections but the Granite Vault contains 2.4 million rolls of microfilm. Even if every online collection represented 100 microfilm rolls (which they do not) the number left to be digitized would be enormous. If the FHL gets busy digitizing their book collection and making the books available online, all of the available books could end up online in the not too distant future.

Various State and National Archives.
I mentioned the Washington State Archive collection of online digitized documents, but only to highlight how far we have to go before even a significant percentage of the state and local documents are available online. The issue of state and local documents also highlights the problem of using access to government documents as a revenue source. Charging for copies of government documents is not new, but it has become a national policy in some countries. For example, England charges for copies of original vital record documents even those over 100 years old. See National Archives.

Major University and Public Libraries
The availability of digital copies of the holdings of major universities and public libraries is slowly changing. Most of the larger libraries have digital collections, but the vast majority of their holdings are still locked up in vaults and storage areas. Most libraries recognize the need to provide digital access, but we have a long way to go before most of the university libraries' offerings are online. One issue with availability in this area is the need to "belong" to the library either by being a student or teacher or by virtue of living in the library district. Universities tend to make their collections available to their own students but not to the public at large.

I think you can see, even with this limited review, that we have a long way to go before we can claim that even most of the documents in the world are available online. This is a dynamic area of change however, and you should be aware of what is going on online to become aware of the resources available.


  1. And that's only in the USA! There are many other countries trying to keep up with the digitisation process. I live in Australia, and the National Archives of Australia, along with the State archives, the many University Archives and State Libraries and museums are often constricted by finances; often resorting to volunteers or gaining the funding through philanthropic organisations. And that doesn't even take into account the many records held by local authorities and societies.

    Through the 1990's I worked on a project cataloguing photographs at my State Library. The project to digitise and catalogue this fabulous resource only became a reality through the benefactors of one such philanthropic organisation. So often it is resources and $$ are what is holding many of these projects back

  2. The greatest challenge with the larger Repositories (i.e., not your local small Historical Society that has no funds to digitize) is the lack of qualified individuals to do the work in conjunction with a lack of funding. The reality is you can't simply slap a 17th or 18th century document on a flatbed scanner (or worse, even hint to put it through an automatic feeder). Archival material needs to be handled by Archives professionals. Sadly, too many Archives aren't funded to provide regular services to researchers much less to spend the time it takes to digitize material.

    As Linda Ottery said, it takes a philanthropic act of kindness to ensure than there is funding. Perhaps, if all of us who are genealogists, historians, family researchers, etc. all did a little estate planning and added a small donation to a local repository in our Wills, maybe 'someday' would come a lot sooner?