Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, October 8, 2015

The Library of Congress, Chronicling America Newspaper Project reaches 10 Million Pages

Here is a quote from the press release dated 7 October 2015,
Online Resource of Historic Newspapers Posts 10 Millionth Page 
Chronicling America, a free, online searchable database of historic U.S. newspapers, has posted its 10 millionth page. 
Launched by the Library of Congress and the National Endowment for the Humanities (NEH) in 2007, Chronicling America provides enhanced and permanent access to historically significant newspapers published in the United States between 1836 and 1922. It is part of the National Digital Newspaper Program (NDNP), a joint effort between the two agencies and partners in 40 states and territories. 
"Chronicling America is one of the great online treasures, a remarkable window into our history and a testament to the power of collaborative efforts among cultural institutions nationwide. The Library of Congress is proud to work alongside NEH and all our partner institutions to make this vision a growing reality," said Mark Sweeney, Associate Librarian for Library Services. "In the coming years, we look forward to adding newspapers from the remaining states and territories as new partners join the program."
To be exact, the number today is 10,001,037 pages. These pages are completely searchable by every word and the images are very good and the pages can be downloaded.

This particular collection is not large compared to some of the commercially available online digital newspaper collections, but the content is very useful. I frequently find my ancestors in the Library of Congress collection. I also find newspapers and articles that do not turn up in some of the much larger commercial collections.

Here is the link to the collection: Chronicling America, Historic American Newspapers.

Searching and Standard Place Names

There is a direct connection between identifying a place where your ancestors lived and finding pertinent records about their lives. This seemingly simple statement turns out to be the basis for a staggering amount of organizational effort on the part of librarians, archivists and now, programmers working for online genealogical database companies. What is involved, is keeping track of information with some sort of system and then finding that same information again.

When I was practicing law, we generated a huge quantity of paper documents. For example, let's suppose I got a new client who was involved in litigation. I had to "open a file," that is, create a place to store documents and other items associated with that case. In this example, the "file" was a physical manila file folder. Each file was assigned a number, usually consisting of a year and an accession number (a number assigned to each new file in the order they were created) i.e. 2001-1, 2001-2 etc. We had a completely manual file system. At this point, we had a choice, we could organize the files by file number or alphabetically by the surname of the client. We could also separate files into topics, such a probate, litigation, corporate and so forth. A single file could be one piece of paper or a room full of boxes. A single civil litigation case could generate tens of thousands of pages of documents.

As an aside, while I was working at the Arizona State University Law Library as a Reference Librarian, there was a whole room in the library devoted to one lawsuit, Arizona v. California, 373 U.S. 546 (1963).

Notwithstanding our legal file organization, we spent a huge amount of time looking for files and even more time looking for specific documents. Now multiply that organizational problem by millions and trillions of documents spread across the entire world. Presently, we have over seven billion people generating billions of documents every day, including this blog post.

Genealogists focus on finding people in historical records. To put that into proper perspective, we need to remember that first, we find the records and THEN we can find the people. Most genealogical researchers (nearly all) come to me and say something like, "I am looking for my (Great-great-great-grandfather or whoever)." What they should be saying (and doing) is that they are looking for documents, records, or other similar items that contain information about that ancestor. Remember this, I did't file or organize the information they are looking for. In fact, the documents may never have been properly "filed" or organized. The documents may have been figuratively scattered to the wind.

By the way, looking for a needle in a haystack is really simple compared to finding documents spread across the world as long as I know there is a needle and I have the right haystack.

Genealogists focus on names, dates and places as primary identifiers. They also rely on a system of indexing, that extracts certain information from a document and thus expands the number of search terms available. In doing this, they further rely on the following:

  • The accuracy of the original record
  • The accuracy of the person reading the original record
  • The accuracy of the person recording the information extracted from the original record
  • The ability of the program to identify and distinguish the record from other similar records 
an on and on and on.

In my law example, a record could be "lost" if it was simply filed in the wrong file folder. In our present time of computerized filing systems, a record can be lost in hundreds of different ways. 

For all its sophistication, today's computerize search systems with their search engines, still rely on someone categorizing a document in a specific way by name, date, place, topic and so forth. We are all left to try and figure out how the documents we want are organized and what we have to do to find them. This process may be as simple as entering a name into Google or as complicated as searching page by page through an old, hardly readable, microfilm when we do not know if the record we are looking for is there at all. Computers and computer programs are wonders to behold, but they are no better than their basic filing systems.

We now have a movement among those who program computers to implement a user generated filing system involving place names. As I said at the beginning of this post, places are crucial to finding pertinent documents. Many people who have lived on the earth have names and dates so similar that telling them apart is extremely difficult. Adding a place name to the search very often solves the problem. But what happens if the place name is wrong, inaccurate or out-of-date? Hmm. Now we have a real problem. 

Here is an example of what I am talking about. Let's suppose we have an ancestor designated as follows:

John Jones, b. abt 1850, deceased, England.

Would you be able to find this person, just from that information? Like many genealogists, you probably would find a John Jones who was born about 1850 in England. In fact, if I were to do a search in using this information I would get 15,997,760 possible results starting with entries in the U.S. Federal Census. Remember, my John Jones lived in England. 

Most genealogists, rather than recognize that their search was hopeless, would assume that's search engine was "broken." They would then try to be more specific. Here is the conundrum. You would need more information about the person before you could search more specifically when more information about the person is exactly what you are looking for (i.e chicken and egg). 

So now, when entering a person into a program such as or, we are given suggested places. Eventually, genealogical researchers hear about a basic rule that place names need to be recorded as they were at the time the event occurred. But what if the search program does not recognize the place, even assuming that we know the place? There you go, that is exactly the problem with programs that suggest "standardized" place names. 

Granted the programmers would like you to conform to their system of organization. They want the following about John Jones:

John Jones, b. 18 April 1850, d. 29 January 1910, Tandridge, Surrey, England, United Kingdom. 

But what about the problem when the place name has changed? What if the place name has changed several times? The first issue is where are the records? The second issue is how were the records categorized in the first instance? Here is a question that hinges on this issue. 

Where are the records located that were generated during the existence of the Arizona Territory?

Arizona Territory was in existence between 1863 and 1912 when Arizona became a state. As I work into this example, think Europe during its history and the boundaries and names of all the countries. 
Before 1863 the land in the Arizona Territory was either part of Spain, Mexico or the New Mexico Territory. When a genealogist can answer a question such as this out of his or her head, they are called an expert. The real issue here is not the location of the records, but how those records, now scattered across the world, are characterized. If one repository files the records under Arizona Territory and then another simply puts them in a pile called "Arizona" and then a third classifies them as Southwestern History: how to you find the records? 

I have written about this topic many time before. Computer programmers working on search engines for genealogical records should be complemented on their efforts to organize huge piles of records. But when they do so and limit their searches by imposing their own organization on the records, things start to fall apart. If I want to enter a place that is not in their list of "standard" places, then my entry is considered to be non-standard. 

Let me go back to my search for John Jones in England. What if by pure chance, I happen to know exactly where John Jones lived? What if I search for John Jones in Tandridge? (By the way, I made up the name and dates, I am searching for a person who does not exist but the place does exist). 

When I enter "Tandridge, Surrey, England" into's search engine (search fields) I automatically get a suggested entry for Tandridge, Surrey, England. What do I get? One result for a person named Wm Jones in the Tandridge, Surrey, England. What about Remember, this is a fictitious name. I get 11,637 results and most of these are John Joneses. 

So, I mark the place in as "exact" and redo the search, and I get no records. 

So, I have gone from over 15 million records to no records just by being specific in the place where the event occurred. Doesn't this fact suggest something about place names? Here is another example. Let's suppose I search for my Grandfather, Leroy Parkinson Tanner. I can search in either or and come up with records about him specifically by just searching on his name. Yes, that is all I need. His name is distinctive enough to be found in either program. 

But what if all I knew were his last name and a place? What if I search for Tanner in St. Joseph, Apache, Arizona Territory, about the place where he was actually born as it was known at the time. I get 19,470 Tanners, but none of them are my grandfather. Why? Let me try again. This time using another geographic location, St. Joseph, Navajo, Arizona. 

Hmm. I get a lot of his brothers and sisters (there were 17 children in the family), but not my grandfather. Let me do another search for Tanner in Joseph City, Navajo, Arizona. I get the same list of Tanner relatives but no grandfather. So, now I go and look to see how his birthplace is entered in the Family Tree program. He is, in fact, listed as born in St. Joseph, Apache, Arizona, United States. So now, I do a search using this place. He finally shows up in the search. But the place name is not technically correct or standardized. It should be St. Joseph, Apache, Arizona Territory, United States.

Guess what, now the search engine cannot find him. So, with a non-standard and wrong place, he can be found, but with the correct and standardized place he cannot be found. 

This exercise illustrates the basic challenge of all computerized search engines. Even if I conform to the parameters set by the programmers and use the "standardized place," that still does not guarantee that my person can be found. Why is this? Because in this case, most entries referring to birth in original records did not differentiate between "Arizona" and "Arizona Territory." The significance of this distinction was simply unknown and unused at the time. 

What does this mean to those searching for their ancestors? You really do need to know the exact place some event in your ancestor's life occurred. But you also need to know how to search for that place. If you have the name, date or place too different from the way it was recorded in the original documents, your chances of finding that ancestor are dramatically decreased. For this reason, we tell people to work from the known to the unknown. What does this mean for "standardized" dates and places? It means that you cannot assume because your place name is "standardized" that the program's search engine will find the right person. Remember, I immediately found Leroy Parkinson Tanner by name, but could not find him with his last name, Tanner, even when I had the exact location standardized where he was born. It might help to know that St. Joseph, Apache, Arizona had less than a 1000 residents and a whole bunch of them were Tanners. 

The final conclusion is that searching is an art, not a science. It takes practice and persistence. Keep looking and keep thinking. 

Wednesday, October 7, 2015

Deflecting Web Attacks

I continue to experience an increase in unwanted spam emails, bogus comments on blog posts and pernicious suggested links on Google+. Since I review all of these suggested comments and connections, I can eliminate virtually all of them, but I suspect that some people are not quite so efficient in eliminating this terrible background noise of web use.

I routinely receive more than 100 emails a day, sometimes over 200. It is rather easy to spool through them and delete the ones I do not care to read. The trick is knowing which ones to delete. Much of my email is simply periodic announcements and ads from companies. Those are pretty easy to glance at and delete. Occasionally, I have to read an entire message.

Because I am on Google+, I get several offers each day for connections. I would guess about half of these are from outside of the United States and most of these involve obvious or not so obvious objectionable websites and connections. I ignore these or in extreme cases, immediately delete them. One very good indicator of a bogus connection with seriously objectionable consequences is that the person connecting to your account has no other contacts or followers. Google+ seems to be a ripe field for the pornographic industry, terrorist organizations and other types of marginal to really bad online predators.

The last category of unwelcome spam that seems to continue to increase is bogus comments on blog posts in the past. The statement of the commentator is usually couched in really poor English and is highly complementary of the post without referring at all to the topic. I recently got a comment from plumbing company about a genealogy post that had a reference to "plumbing the depths" of some subject. These comments all get erased. You should never open or add a comment or follower without checking first. All content should be reviewed. Do not open email messages that appear to be suspect. This is especially true if someone you know, but have had little contact with, suddenly appears to be sending you a link to a website or a Dropbox folder. Do not click on the links. If you have a question, call the person on the telephone directly or send them an entirely separate email message about the suspect attachment.

There is another category of email that is even more dangerous. There are common emails from companies you deal with asking you to verify information or send a reply with some personal information. Do not click on these emails. Call the company first to verify that the email is valid. You will almost always be told that it is not. The most common of these message come from someone imitating Pay Pal. I get several of these a month. Most of them inform me that my account will be closed if I do not respond. I haven't responded yet and my account has not been closed.

There is a real dangerous world out there and the price of freedom is eternal vigilance.

Is there a Microsoft Surface in your genealogical future?

Sometimes it is really hard to tell if the competition between Apple and the rest of the computer/mobile device community is really "heating up" or staying about the same. Regularly, new product announcements are stylized as "Apple killers" or whatever and Apple just keeps gaining on the rest of the industry and making tons of money. The real challenge is not between Apple and Microsoft, but between both and Google. Android operating system usage is way ahead of either Apple or Microsoft.

As genealogists we are enticed by the new devices being introduced. Of course, there is the component of the genealogical community that is still using Windows XP or some ancient Apple system, but if you are considering an update, you are going to see some major upgrades in hardware over the next year.

Microsoft has started off their offerings with a new Surface Pro 4 seen above (excuse the background noise to the intro video, just mute the sound, they don't say anything anyway). Here is a quote about the new product from MacRumors, an Apple Mac blog.
Overall, the Surface Pro 4 is 30 percent faster than the Surface Pro 3 and 50 percent faster than the MacBook Air, with 16GB of RAM and 1TB of storage. The company says it compromised "nothing" in the new iteration, maintaining a thin body while offering significantly better performance. Microsoft also introduced a new Type Cover, with a larger trackpad, backlit keyboard, and an integrated fingerprint reader for users on the older Surface Pro 3 who don't have the camera authentication of Microsoft Hello on the Surface Pro 4. The Surface Pro 4 will be available on October 26, starting at $899.
Microsoft is also introducing a new Microsoft Surface Book, their competition to the MacBook Pro. The background music is better in this video.

Here is what MacRumors has to say about the Surface Book:
When comparing the Surface Book directly with the MacBook Pro, Microsoft stated that its new laptop is two times faster than Apple's device. The Surface Book will also let customers remove the screen, turning it into a temporary Surface Pro 4-esque tablet. The Surface Book is priced at $1,499 and will launch on October 26, along with the Surface Pro 4. The new Lumia 950 and 950 XL, Microsoft Band 2, Surface Pro 4, and Surface Book will also be available for customers to pre-order beginning tomorrow, October 7.
 For me, the issue is not the device as much as it is the operating system. I see the handwriting on the wall, I will have to upgrade my desktop computer some time and my laptop. But for now, I am happy with both and I will stay with Apple.

Tuesday, October 6, 2015

More on Libraries, Genealogy and Research in General

What do you think of when I write the word "library?" Most of my early library experience was in the then modestly sized, Phoenix Public Library. I spent many days during the hot summers, riding back and forth from the library on my bike and reading the seven books at a time checkout limit. As I grew older, I spent considerable time in bookstores. Over the years, my parents acquired hundreds, then thousands of books on a wide variety of subjects. After leaving home and getting married, as we started our family, we also acquired a sizable book collection into the thousands of books. While my children were growing up, we visited the library regularly. Early on, it was the Scottsdale, Arizona Public Library and then the Mesa, Arizona Public Library.

As I progressed in school, I remember the smaller libraries in my schools. I spent even more time reading. By the time I was attending the University of Utah, my library life began to change. I spent more time researching than reading for enjoyment. I would choose a topic and read everything I could find on that topic. This continued for years. During my time as an Intelligence Analyst for the United States Army, I spent two years of intense research and reading. This reading continued in law school and afterwards.

During my time at the University of Utah, I worked in the J. Willard Marriott Library as a bibliographer. After my active duty in Army, I also worked as a Reference Librarian at the Arizona State University Law Library for nearly three years.

Year after year, I frequented libraries in the Salt River Valley. Most recently, we frequently visited both the Mesa Public Library and the Maricopa County Library branch in Gilbert, Arizona.

Genealogy became a predominant topic during the last thirty plus years and I spent considerable time in the Family History Library in Salt Lake City, Utah. For the past almost twelve years, I worked at the Mesa FamilySearch Library (previously Mesa Regional Family History Center). I now spend many, many hours, sometimes more than eight hours a day, in the huge Brigham Young University, Harold B. Lee Library.

I think I have a perspective about libraries and books that comes from extensive experience.

During the past forty years, my library experience has been changing. I began my work with computers over forty years ago at the University of Utah. Now, most of my research and all of my writing is done on a computer. I now read books on an iPad or an iPhone. The Internet has almost completely replaced my research in libraries. But lately, I have found something interesting. I am finding that the Internet is not all knowing. My visits to the books in the libraries have become more frequent than they were in the immediate past. Access to the BYU Library and the Provo, Utah Public Library, have taken me back to the stacks.

At the core of this interest in libraries and books is the desire to learn. This is not a superficial interest. It is a life-long pursuit. Now with that lengthy introduction, I have some observations.

There is a background of discussion among those who frequent libraries and particularly among those employed by libraries, concerning the future of the whole concept of a library. The question involves their survival in their present form and their ultimate survival at all. Can libraries survive the onslaught of digitization and mobile reading devices? Will Google ultimately end up destroying libraries altogether?

Genealogists find themselves in an interesting position. Most of us are older. Some of us find ourselves in libraries for the first time as we gain an interest in researching our families. Some, like me, come from a strong research background. But we also find that much of the information we need has now moved onto the Internet in digital format. Many genealogists, particularly those just starting out, find that they do not need to visit a library at all. They are already overwhelmed with the amount of information available online. In fact, many younger genealogists have probably never visited a library for the purpose of doing genealogical research.

Many of the discussions about the survival of libraries focus on funding issues. It is a situation where those who make the funding decisions have never visited a library and do not see a need for one. After all, isn't everything we need to know online now anyway?

I find that my present experience is mixed. With some I am helping, everything they need is online. Others, find themselves in libraries rather quickly. One reason I moved from the Mesa Public Library and began using the Maricopa County Library was rather simple. Mesa stopped funding their library and the selection of "new" books was extremely curtailed. The Maricopa County Library seemed to acquire newer books regularly.

In this example, what happens to libraries and decisions made by those who operate them, becomes self-fulfilling. There is a decrease in the perceived need for libraries and then funding is cut and the library becomes even less current and so forth in a cycle of destruction. In the case of a university library, there is a different perspective. Libraries are seen as a "status" symbol. Funding for the library is not viewed as a budget item. In a large research university like BYU, the library is a vibrant, growing entity. Thousands of students a day, come to learn, study and even take classes in the library. Strangely, very, very few genealogists see the advantages of a university library. Most are completely unaware of the resources of these libraries.

The movement of information to the Internet is inexorable. My own experience is a microcosm of the entire issue. Just as my use of the Internet has increased dramatically, so, for a time, did my use of libraries. But now, I am seeing the value of the resources in both a university and a public library. Today, I will visit both and I will probably use books from both. Interestingly, neither of the two books I will be reading are freely available online.

Here is the key. Libraries give patrons free access to their books, even those under copyright. It is this factor that keeps them in business. If I go online and look for a current book, I will find it for sale. If I want to pay a fee, I might be able to download a digital copy. If I do not want to "own" the book, I can usually find it in one of the libraries. In Utah Valley, I can also go to the Orem Public Library, if I find a book there and no place else.

This is the key. I do not need, nor do I want, to own all the books. My shelves are full. Until the libraries find a way to make their collections available to those who read on electronic devices, they will continue to lose patrons and funding. Here is the dilemma. If libraries make their collections available online, including copyrighted material, who will come to the library physically? I think my own experience is the answer. I do research. The fact that the latest books are available online makes no difference to me at all. If every book every written were available online, that might make a difference. But we aren't nearly there yet. As long as I have to pay to look at a new, copyrighted book and do not have access to old books outside of the library, I will continue to go to the libraries.

There is a lot more to this issue besides what I have already written. Stay tuned or not, depending on whether you care about libraries or not. Whatever.

Monday, October 5, 2015

Solving a Mystery with Military Records

This morning, I noticed that United States Headstone Applications for U.S. Military Veterans, 1925-1949, a record collection in the Historical Record Collections, had been updated on 2 October 2015. Today. That reminded me of the fact that I had solved a particularly difficult research problem with a military headstone. A search in this collection quickly produced the record that solved the mystery.

With a further click, I could see the original application record.

Along the back of this document, in pencil, was a summary of my Grandfather's military record.

When I found this original record, I had been looking for information about his military service for some time.

My Grandfather, Leroy Parkinson Tanner, had served in both the Mexican Border War and World War I. For a long time, I could not find his military record. I found the name of his unit from the Veteran Headstone, marking his grave in St. Johns, Arizona. But I still could not find either his name or his unit in any of the World War I records.

The solution came in two stages. First, I found my father had a copy of his discharge papers. Then I found this record on By the way, the record on is in color and it makes the record much easier to read. Here is a screenshot from I don't know what has against putting their records on in color.

If you find a record on one of the websites, always look to see if the same records, in different scans, are on other websites.

It turned out that my Grandfather enlisted in a National Guard Unit that was mobilized for the War. This was a small mystery, but it was solved with a very specific military record, a Veterans Headstone. Had I found the application, before I found the discharge papers, I would have been able to figure out the mystery from these documents.

This points out a very important fact about military records. Just like any other records, there are a lot of them and it pays to keep looking and keep digging. I found this document because it showed up on an search for my Grandfather. But it did not show up as one of the first records or even one on the first page. But now, I could find the record on or on with a quick search. We have a most marvelous tool for research with these digitized records. Let's use them.

Sunday, October 4, 2015

Genealogists Using the Cloud -- Pitfalls and Promises

I received a suggestion from my Australian friend, Wayne, who reportedly lives somewhere in the Australian Out Back. At his request, I will write some of my thoughts on the topic of Cloud Storage for backing up our genealogical data and all other data for that matter.

For me, the practical reality of storing my data online is the simple fact that my data files exceed the capacity almost all practical online storage plans or companies. This is merely a cost analysis decision. Here are a few of the more popular programs and the advertised cost of storage. I now have very close to 4 Terabytes. In addition, is the time it takes to transfer these massive files from one storage device to another and the time it takes to move that much information online.

Note that some of these companies have special introductory pricing. I am not particularly interested in having a "free" account for only a short period of time. I am also ignoring those "backup solutions" that involve a separate hardware server. This could be an alternative, but, once again, cost is a factor. Right now, a Seagate 8 TB Hard Drive is selling for $237.49 on and the price will most likely come down in the near future. I presently use multiple backup hard drives and regularly store data offsite. Hard drives seem to last about two to three years. Remember, the fact that I am talking about a lot of data. You may have a lower cost and the option of online storage to supplement your own local storage may be more attractive. I do find determining exactly how much a certain level of storage to cost is very difficult in almost all the programs. Also, bear in mind that the prices may change at any time.

The crucial issue is what happens to your data if you fail to renew your subscription for any reason? This is the hardest question of all to answer from the websites. This could be one of the most important issues. Another issue is ownership and control of the data. You will likely find, by reading the "fine print" that all you get is a license to access your own data. Remember, you must also have and maintain Internet service to use this type of system.

Another important factor. You must still maintain enough local capacity to host all your own data. These services back up data on a particular computer and/or hard drive. You still have to have the computer or the hard drive.

Now, on to the list:

  • -- Starting Plan is $59.00 a year for an individual computer that does not include external hard drives. In some plans, any file over 4 GB must be manually added to the backup. The size limitations are directed at backing up specific devices. The starting cost for my setup begins at $269.99 a year.
  • -- Based on 1 TB of storage, a 24 month contract, the cost is $104.28. There are additional charges for external hard drives and for files over 5 GB. 
  • -- $5.00 per month, per computer, claims no file size limit and no data limit. As with all the services read the Terms and Conditions of use carefully.
  • -- Free to $250 a month for 2.5 TB. 
Now what about the popular online storage companies that do not specialize in backing up files? Here is a breakdown of the amount of storage available and the cost. These systems could be used to backup your genealogy data files but the rest of your files would be at risk. It is even more difficult to determine what you are getting and the terms of use. Files can be synched with your local drive. 
  • -- Free up to 2 GB, Dropbox Pro is limited to 1 TB. The Business Account is unlimited storage for $5 per month per user. This is not a backup service, individual files must be copied and organized. 
  • Google Drive -- Google Drive for business clients get 1 TB for less than 5 users. 10 TBs is $99.99 a month.
  • Microsoft Onedrive -- 15 GB basic, 1 TB for Office 365 for $9.99 per month. 
  • iCloud -- Up to 5 GB free and $9.99 per month for 1 TB. 
Remember, with all these systems, you still have to purchase and maintain your local storage capacity. So I would have to purchase hard drives that would back up all my data (3+ TBs) and then pay the cost of online storage. Also remember to ask what happens to your data if you stop paying. You might also find a "better deal" than the ones I have reviewed. I suggest that you still explore the limitations. 

One practical alternative is to store crucial files online and keep the rest backed up locally. This does not avoid the issue of what happens to your data if you stop paying, but it does limit the cost.