Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, October 19, 2016

How many books have been digitized?

I am always interested in the disparity between our perception of reality and what is actually happening in the world around us. Genealogists are no exception to this perceptual myopia. Genealogists' main activity is discovering records about their ancestors and relatives. From this, you might expect that they would have an active interest in finding these valuable records. One reality is that these valuable records of our collective past history are in published books. As I have observed in previous writings, from my own observations, very few genealogists are aware of the trove of genealogical books even when they are sitting in a large library.

If some lucky genealogist happens to stumble across a book, such as a surname books written about an ancestor, they then likely become convinced that the book is absolutely true and start adding everything in the book into their family tree, but that is another issue.

One of the main limitations with books has always been finding them. Libraries are a wonderful place to explore the world of books, but you do have to go to the library and spend time looking. Many of us have extensive experience in local, public libraries. Unfortunately, very few of these local institutions have many books that are helpful to genealogical research. If your family happens to be from the area where the library is located, the library may have some extremely valuable items for research but generally, a smaller, local library will have a few of the more "popular" books on genealogy and little else.

Please do not misunderstand me, local libraries usually have specific research opportunities in the form of locally donated books. They may also have other donated or accumulated items of interest including newspaper collections and local memorabilia. But they have limited space and limited book collections.

Larger libraries with huge book collections and vast research opportunities are primarily either in large cities or associated with larger colleges and universities. The fact that they are destination research centers makes their use primarily limited to the serious researcher.

Now we come to the impact of digitization. Now, for all practical purposes, anyone with a connection to the internet can access millions upon millions of books that include overwhelming number of genealogically relevant items. The main challenge with this monumental digitization effort is that the digital books, also called ebooks, are scattered all over the internet in thousands of different websites. Determining whether or not a particular book can be accessed on the internet in digital format can be a daunting research task.

Copyright law in the United States and elsewhere imposes a really strange and daunting limitation on research. I can go into a physical library and look at any book that is available regardless of the copyright status of the book. I do not have to know the copyright status of the book to check it out of the library. But digital books are viewed as a threat to the publishing industry and so copyrighted digital material is highly regulated as you can see anytime you rent a video and have to read the "FBI Warning." So, I can find a particular copyright protected book online and see that there is a digital copy available, but only under some very restricted circumstances can I actually read the digital copy of the book even though I could visit the library and read the physical copy of the book without that same limitation.

This restriction is slowly being eroded by digital libraries such as, but as yet, these online lending library arrangement contain very, very few books of research interest. Fortunately for genealogists, many of the books we find valuable for research purposes are in the public domain, so these books are more generally available online.

So how many books have now been digitized and where are they? That is the question. It is only through exceptionally diligent online research that anyone can find relevant digital books that are freely available. Some websites even restrict the use of public domain digital material as if they had ownership rights. For example, the Brigham Young University has a huge online collection of digital books numbering in the millions of volumes but access to the collection is limited to students, faculty and some staff members only. The books cannot even be researched on a limited basis by non-students. The irony of this situation is that if the BYU library were part of the organization, the public domain portion of their collection would be freely available online to anyone who was interested. What is even more interesting about this situation is that many much smaller and less important university libraries are active participants in the organization. See the HathiTrust Partnership Community.

My example of the Brigham Young University Library is just one of many examples of the spotty availability of digital books. One institution may make a given book freely available while the same book is classified in a restricted section of another website.

The question of numbers is really nearly impossible to answer. For example, Google Books has millions of digital books online but does not publish the total number anyplace that is discoverable. Some websites provide a number but the manner in which individual items are counted differs dramatically from website to website so an accurate count is impossible. All I can really say is that there are millions upon millions of books available online and that perhaps a subset of millions of those books have genealogical interest. I can also only say that as genealogists we need to remember to include detailed online book searches in all our general research efforts. The days of relying on local and larger libraries for this material are over. We still need to go to libraries for the yet-to-be-digitized items, but we can access so much online now that we should focus our initial efforts on online sources.


  1. I took a look at BYU's listing of electronic books at .

    All of the electronic book libraries listed in the category of "Ebook Collections" are purchased through a number of different Ebook vendors. These collections include books that are still in copyright. To purchase these collections, the university must sign a license agreement limiting the use of the content to members of its community - its students, staff, and faculty. Opening these collections to those unaffiliated with the university would be a license violation and it would not be long before the vendors of these collections would revoke the university's privileges to use them.

    The same is true of the ebook libraries listed in the "Reference Ebooks Collection."

    The "Historical Ebooks Collections" are books that are not under copyright; however, for most of these collections, the books in them were microfilmed years ago by a commercial enterprise that has now placed them online. As with the earlier categories, the library must purchase these collections and sign a license agreement limiting use to its own community.

    Only the last category of books, "Free Ebook Collections Around the World" are open access and freely available to everyone.

    Please don't blame the library for not opening these collections to those outside the university. If it wants to be able to offer these resources to its students and employees, it must agree with the terms of the license - a legal contract.

    I am a librarian and I deal with this every day. Librarians want to make resources available to as wide an audience as possible. We, too, feel that it is unfair that anyone can go to a library and use a print book, but access to an electronic of the same title may be restricted because of a license. But we have to comply with the license terms.

    1. Thank you so much for your insight. The core of the problem is that ebooks are being treated as something legally different than paper books. In a sense, the companies and others who are hiding behind the copyright claim are unilaterally extending copyright protection even further than the courts have so far. I am not blaming the university, merely pointing out an example of the contradictory nature of what is happening in a lot of other similar situations on the internet.

    2. Interesting article relating to this post can be found at . It calls on the new Librarian of Congress to review copyright laws with an eye to making more research open access.

    3. Great article, Cindy. Thanks for sharing the link.

      Melissa Finlay

  2. Great article. Digitizing is the need of the time. All the books need to be digitized in order to save them forever!