Some people eat, sleep and chew gum, I do genealogy and write...

Tuesday, July 5, 2016

Where is all the genealogy? - Part Two: The Hathi Trust

It was interesting to read the comments about The Internet Archive from my last post. It seems like there is some difficulty in navigating the website. From my perspective and particularly compared to almost any U.S. government website, it is really quite simple to use. But I will plan on doing a webinar on the website once we settle down to business again in August.

Meanwhile, I will move on to The Hathi Trust Digital Library. This is a massive online collection of digitized material from various, mainly U.S. university libraries around the country. Access to the collection is open to the public for any out-of-copyright books or other items but limited to access through the participating organizations for the copyrighted material. As of the date of this post, the website had 14,618,130 total volumes with 38% or 5,615,642 volumes in the public domain. The website includes extensive instructions for searching and using the data.

For genealogists, the crucial issue is that you can search the full text of all of the books and other items on the website in any of dozens of languages.

It is frustrating for the user who does not have access from a computer on a participating organization campus or other access because so many of the items are only partially searchable, but those in the public domain are available to all users. Here is a screenshot of a search for my Great-grandfather, Henry Martin Tanner.

The book about John Tanner, my third great-grandfather, is available only for limited search. This does give you an idea if the book could be useful, but it only gives you a "snippet" of the information. On fact that I find interesting is that this same book, which is still technically under copyright, is completely available on and

It appears to me that many of the large online websites, such as The Hathi Trust, simply consider any book published after 1923 to be still under copyright when the reality of the situation is that many books published from 1923 to 1989 have lost their copyright protection through failure to provide a notice or to properly register. So unfortunately many items have "restrictions" imposed by the websites that exceed the actual protection of the copyright laws. For more specific information see "Copyright Term and the Public Domain in the United States, 1 January 2016." In the case of the John Tanner book above, there was no copyright notice on the book and a copyright claim never existed from the time of publication.

The real tragedy of this copyright conundrum is that a huge number of books are out-of-print and can no longer be purchased and so the whole concept of copyright protection is frustratingly inapplicable. The authors of these out-of-print books are not going to benefit from the sale of their works and in these cases the copyright laws merely make it difficult to find and use the books. It is also tragic that every time I write about online book collections I have to get into the copyright issue.

Why would I use The Hathi Trust website if many of the books come from Google's digital collections or other sources? That question raises a lot of issues. Primarily, all search engines are not created equal. When you are looking for specific information hidden away in one or more books out of millions, it helps to do a variety of searches and always assume that the information you are seeking is there someplace to be found.

As with all large online collections, you don't know what is there unless you search carefully and completely. Remember to always search generally online with a Google search for any item that does not appear to be immediately available. Learn the lesson from the John Tanner book. Someone else may have done their homework and put the book online.

No comments:

Post a Comment