RootsTech 2015

Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, May 28, 2014

Online Digital Newspaper Collections by State -- The Introduction

Newspapers are one of the most crucial type of genealogically important records, but they are also one of the types of records most likely to be lost. By their nature, newspapers were printed on cheap, highly acidic paper that begins to deteriorate almost immediately. In Arizona, we received a lot of local newspapers delivered free to our driveway. If we did not retrieve them quickly, in a matter of hours in the Arizona sun, they already had begun to turn yellow because of the chemicals used to produce the paper. Even if carefully preserved, newsprint becomes brittle and yellow with age.

Because many newspaper repositories around the world recognize the temporary nature of newsprint, there have been massive efforts to preserve the existing paper copies for some time now. Many countries around the world have already begun the process of preserving their "newspaper heritage" through digitization projects. Making a legible copy of a newspaper page obviously requires specialized (read expensive) equipment and handling the brittle pages is also a concern. In the United States, the digitization efforts have taken an interesting turn. Many older newspapers were initially preserved through microfilm copies, so additional specialized equipment is needed to make digitized copies of the already photographed roles of microfilm. Initially, commercial enterprises began the process of microfiliming copies of newspapers and providing the microfilmed records to libraries and other repositories. Some of these same companies have now digitized these records and provided them online. For this reason, some of the larger digital collections are private, commercially operated companies.

There is another very important issue in the digitization effort. That is the existence of copyright claims. Here is one area where copyright acts as a extreme hinderance to preservation. The tremendous length of copyright protection exceeds the life expectancy of the media. In other words, the newspapers will disappear before the copyright to their content expires. This presents a legal nightmare for the prospective digitization projects. In addition, many newspapers were published for only a short period of time and further, the companies or individuals publishing the newspaper are long since dead or defunct. These and other factors make overcoming the obstacles imperative and dramatically point out the inadequacies of some of our social and legal structures.

There are quite a few national efforts in the United States to make digital copies of the newspapers available online. Most of these are subscription only services and in some cases, it is difficult to determine the exact coverage of their collections. Since some newspapers were published for only a short time or the publishing companies were sold or went out of business, determining which copies are available and where they are located is a problem. Small local libraries or historical societies may have the only existing copies of some newspapers. In some cases, these local collections have been consolidated in state libraries and historical societies. There are also likely some valuable collections in private collections. Presently, new digitization projects are announced regularly.

Newspapers are an extremely valuable source of information for genealogists. Much of the history of our ancestors in America can only be found by reading the newspapers of their time. By their nature, most historians and genealogists included, have viewed newspapers as "secondary" sources. Even today, newspaper reporting is maligned as inaccurate and sensational. Those criticisms are well deserved. But in many cases vital information about our ancestors may only have preserved in the local or national newspapers. No reasonably exhaustive search is complete without searching newspaper archives.

I am aware of only one attempt that has been made to list all of the newspapers in the United States ever published in one database. This is the US Newspaper Directory - 1690 - Present on the Library of Congress website. It is interesting that I find this resource seldom mentioned or used by genealogists. The Library of Congress has listed virtually every newspaper known to exist in a public collection. The list is limited by the lack of adequate such lists available from commercial online websites however. In some cases the copies of newspapers listed in state and local repositories may also be available on one of the large national commercial websites.

The limitation of the Library of Congress list is that it is overwhelming in most cases. Any search by state will produce hundreds of newspapers published over the time periods included in the directory. When I show this list to genealogical researchers for a specific county they are usually surprised and overwhelmed at the number of newspapers published and the effort it would take to try to locate and read them all. But this one fact alone indicates how much history of our ancestors is being ignored by the failure to comprehensively search existing sources.

Another attempt at listing just the the online collections is available from Wikipedia. The article is entitled Wikipedia:List of online newspaper archives. When I considered doing a list of such resources, similar to the list I did recently for Online Digital Map Collections by State, I thought seriously about whether such a list was necessary given the two lists I have so far referenced. Wouldn't any such list of digital newspapers just be a copy of what is already in the other two lists? However, both lists lack an explanation and overview. When I started the list of maps, my wife asked me why I didn't put out the list serially, state by state. At the time, I decided to go ahead and make an entire list even if it could be criticized as incomplete because I was unaware of any such list.

Interestingly, both the Library of Congress list and the one on Wikipedia ignore the commercially available online archives. Some of the links in the Wikipedia article are to commercial websites, but neither list includes a list of those online repositories. What I finally decided to do was to create my own list, which I could then incorporate into the FamilySearch Research Wiki and include as many online sources as I could find. But to do this properly, I need the time to compile the list one or two states at a time. Once the project is complete, then I can publish the entire list in one post for an easy reference.

So I begin. I am aware of only one attempt at compiling a significant number of digital newspaper images for free access. That is the Library of Congress's Chronicling America, Historic American Newspapers. At the time this post is being written, the Library of Congress had 7,705,905 digitized pages online. It appears to me that the collection grows at about the rate of 500,000 pages or so every six months. This list can be searched by every word in the newspapers. When I tell genealogists about this online source, they immediately ask whether or not that includes obituaries. I usually say, no, they cut them out before doing the digitization. Of course, the obituaries are included. Once again, however, it is important to understand the role that copyright plays in all of this effort to digitize content.

Once the newspapers are imaged (digitized) this is only the first step in the process. Next, the content of the newspaper must be indexed. This is usually accomplished by using optical character recognition software (OCR). By its nature, OCR software is not perfect because the original copies may not be readable or the text may be missing. Mistakes in the original are also duplicated by accurate OCR. So any index to newspapers will be limited by the ability of the programs to read the text. As with any original document that has been indexed either by people or programs, it is important to view the original documents. This is one huge limitation of the commercially available newspaper collections. They usually have no convenient way to see the entire edition of the newspaper or read the papers as a whole from day to day. Their search engines often produce a single page from the newspaper with no apparent way of moving either to the previous page or to successive pages of the same edition. This makes the researcher entirely dependent on the indexes.

OK, this is enough of an introduction. I will now have to look forward to getting down to work and producing the list. Stay tuned. Oh, by the way, this was started in the middle of my vacation so don't expect too much for a while. My next post will be on the effect of copyright on digital newspapers online. I think this is an important consideration and needs to be discussed before jumping into the whole subject of making a list of digital online content.

6 comments:

  1. http://fultonhistory.com claims "Over 26,800,000 Old New York State Historical Newspaper Pages". I love that site and it is not just New York state anymore.

    ReplyDelete
    Replies
    1. It is very important to note that this is a "labor of love" done by ONE man by himself for the most part. He scans on a system he built himself, and has one of the largest, if not the largest, collection of searchable, digitized newspapers in the US, if not the world. His name is Tom Tryniski and we all owe him a great debt of gratitude for his work in making so many newspapers from central New York State available for free to all of us who do genealogical research. Hats off to Tom Tryniski!

      Delete
  2. Here's a good newspaper list to reference from Kenneth Marks on The Ancestor Hunt: http://www.theancestorhunt.com/newspapers.html

    ReplyDelete
    Replies
    1. Thanks. I am just starting my project.

      Delete
  3. The free Genealogy Search Engine (http://www.genealogyintime.com/tools/genealogy-search-engine.html) simultaneously searches dozens of newspaper archives held by universities across the United States. It also searches the massive Google Newspaper archive.

    ReplyDelete
  4. Dear Sir\Madam,
    I represent the ALANIS Software company. I think that you may be interested in the solution we made for newspapers digitization and segmentation. We have algorithms that do automatic filtering, analyze images and find articles with correct reading order. Herewith the accuracy of the article reconstruction algorithm is very high. Also, we offer tools for manual correction of automatic results and book digitization.
    We have started a youtube channel, where we will upload screen capture videos of our software and our other news. Please check our first clip demonstrating our image processing tools for book scans (text dewarping, finger shots masking and other).
    http://youtu.be/Zzm-YPUjZT8
    A scope of the tasks that we can solve is quite large. We will be glad to cooperate in case of your interest.
    Kind regards,

    Marat Gabitov
    ALANIS Software
    Tel: +7 383 335 62 01
    e-mail: marat@alanis-software.com
    skype: maratgg555
    www.alanis-software.com

    ReplyDelete