Some people eat, sleep and chew gum, I do genealogy and write...

Sunday, October 30, 2016

Can Indexing of Historical Records be Done by Computer Programs?

In a recent blog post from entitled, "What’s New on FamilySearch—October 2016," there is an announcement that in conjunction with, an obituary collection in the Historical Record Collections was partially computer indexed. Indexing as done by FamilySearch is a relatively labor intensive activity. Every indexed document is manually indexed by two separate indexers and then subject to "arbitration" by a third person who reviews the indexing done by the other two. Presently, there is a huge backlog of indexed records that need to be "arbitrated." Additionally, because of the pace of digitization, there are many more digitized documents than there are ones that have been indexed.

If we assume that indexing is "necessary" rather than a convenience, then the task of indexing all of the billions of records waiting to be indexed might seem overwhelmingly difficult. There are two main possibilities as I see it. We can dramatically increase the number of individual indexers and arbitrators or we can employ some already existing data processing techniques. The big obstacle to using computers to do the indexing of many historical records is rather simple: handwriting. Only very recent historical records are in a printed or typewritten format that makes them subject optical character recognition and therefore subject to computerized indexing.

Optical character recognition or OCR has come a long way from its initial introduction in the early 1900s to transcribe text for people who were blind. Today, we have millions upon millions of online ebooks that have been generated by optical character recognition programs. Meanwhile, efforts to read handwriting with OCR has become one of the yet fully developed and much sought after goals of computerized indexing efforts. Handwriting recognition programs are commonly used, mostly with human backup, in U.S. Mail distribution and many other industries. The challenge posed by the historical records is mainly really bad handwriting and faded documents.

Any genealogist who has done research into old, handwritten documents can attest to the difficulty in deciphering old handwriting. Hence, the army of human indexers. But there is no reason why any printed or typewritten genealogically significant documents could not be indexed, perhaps with a human review, by computer programs. In fact, any machine readable document should be entirely read and rendered searchable by any word or character string in the document. has done just that with its online Books collection that current has over 312,000 digitized books online.

I might note, that for some strange reason, the digitized books on are not included in the searches made in the Historical Record Collections so the two different collections must be searched separately.

I am aware of significant efforts being made in handwriting recognition, but this area is still not to the point where it will replace much of the human involved efforts. Meanwhile, I would suggest that computer-aided OCR can make a significant advance in indexing printed and typewritten documents and that this should be done and the results made available to the genealogical community one way or another.

Saturday, October 29, 2016

Reflections on my first blog post: Check out the FamilySearch Wiki

My very first Genealogy's Star blog post was posted on 21 November 2008 and as you can see in the screenshot above, it was a short, two paragraphs long. At the time that post was written the Family History Research Wiki had been online for about a year, but had only been publicly available for less than six months.

Presently the Research Wiki has over 84,000 articles and has recently undergone a substantial revision. For the last couple of years, I have been dividing my blogging topics between Genealogy's Star and my other blog, Rejoice, and be exceeding glad... The Rejoice blog is directed primarily to members of The Church of Jesus Christ of Latter-day Saints and most of my comments on FamilySearch and the website are posted on the Rejoice blog.

The wiki format of the Research Wiki has proved to be a stable and endlessly expandable venue for information about genealogical research. Over those, now almost eight years, I have spent a considerable time adding content and editing the existing content of the Research Wiki. As you can see from the comment in read at the top of each page, the Research Wiki has come under a more restrictive editorial policy, due in part to the threat of inappropriate content being added.

My own life has grown substantially more complicated since those early days of the Research Wiki. But my core activities still involve a great deal of writing and teaching. The Research Wiki is an almost perfect place to funnel down all of the information I have accumulated over the years. Although my editing efforts have decreased of late, due to the pressure of other activities including online webinars, conferences, three extensive blogs and many other activities, I still feel a huge connection to this valuable genealogical resource.

During that same time period, blogs have also evolved. Social networking has become nearly globally available. According to, as of the second quarter of 2016, Facebook had 1.71 billion monthly active users and there were altogether approximately 2.7 billion active social networking users. Compared to the overall reach of and the other social networking websites, my blogs are an insignificant dot in a huge social networking universe. I sometimes wonder what might have happened if I had chosen to write about a topic a little less obscure than genealogy?

As I always say, being famous in genealogy circles is like being the Mayor of Nutrioso, Arizona. You may have some local recognition but overall you are pretty small potatoes in the huge online world. By the way, this comment is not intended as a reflection on the real Mayor of Nutrioso, should such a person exist. But it is merely a comment on the relatively meager fame and fortune you can accumulate when you confine your online comments to genealogically relevant subjects.

Friday, October 28, 2016

Seeing the World in a Grain of Sand: The Challenge of Genealogical Data

I have just become overwhelmed with the disparate aspects of genealogy: the minutiae and the universal. As the saying goes that you can see the world in a grain of sand, you can also see the whole human family in one individual. Here is the quote:
To see a World in a Grain of Sand
And a Heaven in a Wild Flower,
Hold Infinity in the palm of your hand
And Eternity in an hour.
Blake, William. 1905. Blake's Auguries of innocence. [London]: Pr. for E.V. Lucas.

Blake was probably not thinking about all of the challenges of regularizing and correcting 150 years of genealogical research done by hundreds of my relatives but the quote does provide a little perspective about the challenges of taking a longer view of the problems. I often say that I am trying to teach genealogy one person at a time and when all is said and done, that is the only way I have found to make any real progress.

Not too long ago, I was helping a patron at the Brigham Young University Family History Library when he asked me why I did genealogy. I responded and asked him whether his question was directed at a religious motivation or a some other. Of course, I do have a fundamental religiously motivated reason for being involved in genealogy, but so do a few million other members of The Church of Jesus Christ of Latter-day Saints. But what is my motivation for spending so much of my time and effort going far beyond mere involvement or belief?

In response to the patrons question, I mostly shrugged off any discussion of ultimate motivations and focused on helping him find Arabic language resources in the Library. But the question apparently lodged itself in my mind. Why do any of us who call ourselves genealogists become more than casually involved in a subject as complex and challenging as historical research? Part of the reason is that our background and our interests coincide with the skill set necessary to be involved in this particular type of activity. I have a lot of interests and could have a lot of time-consuming hobbies, but over the last thirty plus years, genealogy is the only interest that continues to survive for any long period of time.

When I began working on my family lines the main challenge was the huge amount of paper records my initial survey generated. But as the years pass, I now see way beyond any mere paper-based context and have moved on to seeing vast quantities of data accumulated online in the huge family tree programs. 

It would be easier, I suppose, to focus on one ancestor at a time, but in correlating my family research with all those who are working on different family lines, my attention moves from line to line and from individual to individual. It was a much simpler time when the research was mostly linear.

My best guess is that I relate to genealogy the same way William Blake related to the physical and spiritual nature of the world. If you read the entire poem called Auguries of Innocence, you will see that Blake was illustrating the relationship between the seemingly inconsequential acts of man and their eternal effect on mankind. Likewise, I see the relationship between creating a unified and consistently accurate record of each family's history in the context of the eternal nature of the work. To Blake, killing a fly or a caterpillar had eternal consequences. To me, correcting and cleaning up a huge family tree has the same eternal consequences. It is only by focusing on the minutiae that I can actually see the eternal.

Thursday, October 27, 2016

How to protect yourself online for genealogists

It has been said that the only way to protect your computer from being hacked is to not have one and if you do have one, never turn it on. But that, like all other such statements is far from realistic. I remember back to the days when we first began using personal computers, as they were called at the time. I owned an Apple dealership at the time and we had several models of Apple computers set up on the sales floor for demos. A customer came in an after a few minutes informed me that all of our computers had a computer virus. I was incredulous. I thought it was a joke. But he kindly showed me the problem and that began my involvement with viruses, Trojan horses, worms, Phishing scams and all the rest. For a more complete list of malware, see Wikipedia: Malware

Yesterday, my wife got an email that said our Subaru had been recalled. She opened the email and read me the contents before I could stop her. Fortunately, she was on her iPhone rather than her computer. I immediately, went online and found that the problem in the email was bogus. There was a recall, but our car was not affected. The email was most likely a phishing scam to get us to click to another website. Had the email been on her computer rather than a phone, she might have infected her computer by simply opening the email. In any event, by opening the email, she likely confirmed to the sending party that the email address was active and available for further phishing. Real recall notices come through the paper mail.

If you live in a large city, you probably are aware that certain areas of the city are dangerous to enter alone after dark. You are probably careful to lock your doors even when you are at home. You may have a security system and even camcorders viewing your property. You are careful to lock your car and do not leave valuables in open view. All of this becomes second nature to those of us who live in large cities, but are not so common in smaller, outlying towns and villages.

The truth is the second you get on the internet, you are now in the largest city in the world and there are dangers at every corner and turn. But just like living in a large city, self protection becomes an ingrained habit. I can still travel and function in a large city even though I know it is dangerous and I can still operate on the internet without too much concern, because I know how to react by habit and long experience.

I still hear about people who "surf" the internet. Surfing the internet is like walking around in the dark in a large city. You are simply asking for trouble. My first rule of safe, online computing is to be careful and direct all your searches to specific topics that you control. Sometimes, you cannot control the links and websites that appear in a search, but random clicking is sure to produce unwanted and dangerous results. Obviously, it you are looking for trouble, you will find and find it quickly. Filth and sleaze is common and can become highly addictive. The internet is a great and wonderful source of information, but it is also a trap for those who do not know enough to act for their own good.

Here are some websites with basic information about computer safety.

Yes, there is a risk. Life is full of risks, but we learn to manage our risks against our benefits. The key to avoiding nearly all of the bad parts of the world of computers is to become educated about the nature of the risks and patterns of behavior that increase those risk factors. 

The Cake Boss Buddy Valastro Will Keynote, Judge at RootsTech 2017

The RootsTech 2017 Keynote Speakers just keep getting more and more interesting. Today's announcement is as follows:
SALT LAKE CITY, UT, 27 October 2016)--RootsTech, the largest family history conference in the world, is pleased to welcome the popular Italian-American celebrity chef, Buddy Valastro, also known as the hit TLC series, Cake Boss™ as a keynote speaker on Saturday, February 11, 2017, in Salt Lake City. Valastro will also judge a local cake decorating contest hosted by RootsTech.
I have a couple of daughters that would probably do very well in any cake decorating contest. You can see some of their cakes on our food blog, Family Heritage Recipes, a living heritage of good food!  If you are not familiar with Buddy Valastro, I suggest reading the blog post entitled, "Five Ingredients That Make Up the Cake Boss." With my wife's family background in cooking and the tradition being carried on by my five daughters and two daughters-in-law, I should be well aware of the importance of food in family history.

The first ever cake decorating contest at RootsTech 2017 is described as follows:
In addition to keynoting at RootsTech 2017, Valastro will help judge the first-ever RootsTech cake decorating competition. 
There will be four different categories to compete in—wedding, birthday, holiday, and graduation—and there will be three finalists and one grand prize winner selected in each category. Cakes will be on display Saturday during RootsTech and Family Discovery Day where thousands of people will view and have a chance to vote for “People’s Choice” winners in each category. Official rules and entry information for the contest will be available soon at
I am sure that there will be some stiff competition in Utah for this event.

Wednesday, October 26, 2016

United States House of Representatives, History, Art and Archives

Just when I think you might have some idea about the vast digitized resources on line, I always seem to find another huge archive, This time it is the United States House of Representatives, History, Art and Archives. As the website explains, it is a "collaborative project between the Office of the Historian and the Clerk of the House's Office of Art and Archives. Together, the offices serve as the House’s institutional memory, a resource for Members, staff, and the general public."

This archive is completely searchable. I tried to include some screenshots, but for some reason they did not work. Here is another try at getting them to work.

Here is the search section of the website.

Ignoring the Online Miracle with Misconceptions: Part Two - The Paper Trap

To some genealogists paper is reassuring. It is familiar and seems permanent and durable. On the other hand digital technology seems evanescent and transitory. If I give my email address to someone who is paper-based, they will try and find something to write on and a pencil or pen. If I give my email addresses to someone who is tech-based, they will take out their phone and put me in as a contact. As long as I can remember, I have been bound to a calendar and a "day timer." Keeping track of my schedule has been one of the major challenges of my life. On the other hand, with a centralized, online calendar, I have managed to keep a complex schedule going for many years with a minimum of missed appointments even with almost constant changes. Probably one of the most obvious examples of transition is the computer with rows of paper Post It Notes stuck across the bottom of the screen.

The issue of paper vs. digital is a much deeper and more complex problem for genealogical researchers. It is true that much of the world's genealogically valuable information is still on paper or the equivalent microfilm. It is also true that some people simply feel more comfortable looking at a piece of paper than they do working on an electronic device. But if you look around, you will start to see that this is not an either/or situation but an issue of obtaining a balance between the two forms of record keeping and communication.

For example, my wife is not likely to stop sending paper birthday cards to our grandchildren any time soon. She would not consider sending an "electronic" card. Equally, some genealogists will always refer to a paper pedigree chart or keep paper copies of documents and research notes. But in thinking about this subject from time to time, I realize that the main reason that almost everything I do is not on some type of electronic device (I am writing this on my iPad Pro) is because it is a huge and difficult effort for me to hand write anything for any period of time. I learned to type in high school and have been typing almost everything since then. I also love the way my computer corrects my spelling and most of my typos as I am typing along.  What I see as a major obstacle to digital integration is an inability to type (or keyboard or whatever).

Why have I returned to one of my common themes in the context of misconceptions? Because there really are a lot of misconceptions about the transition from paper to digital. The first misconception is the idea of paper's permanence and durability. A paper copy of my calendar, for example, can only exist in one place at one time. If I misplace my paper calendar, which I have done by the way, I am essentially lost. On the other hand, as I add entries to my online, electronic calendar, I instantly have a copy of the entry on every one of my devices. If I go to the library and do some research and write all my notes on paper, like I used to years ago, I often fail to look at that research ever again and I often lost the notes in my huge pile of similar notes. But if I enter the information immediately into my centralized and ever present family tree program, I always keep going. Sure, the paper is durably lost in my piles of paper.

When I was young, one common occurrence, especially in the summer was having our electricity go off and thereby plunging us into the dark. Today, barring a natural disaster of some kind, our electrical power is rarely off and if it is off, it is only for a few minutes or so. Some genealogists cite a loss of power to electronic devices as an incentive to keep writing things down on paper. The misconception here is that they would be doing genealogical research in the dark during a power failure. We can all imagine dystopian scenarios where there are no longer any of the more common, current devices available, but really, are you also imagining yourself doing genealogical research in an end of the world type situation?

One of the most common genealogical issues I encounter is the situation where someone is trying to find the origin of an immigrant ancestor. I was recently helping a patron at the BYU Family History Library with some research about her ancestor. She had spent years looking for his origin. We made some progress, but she had to leave. She returned at a time when I was not in the Library and asked for help from another missionary. She discovered that this missionary was a relative and had been searching for the same ancestor and making progress. This is the misconception that we are all alone in our search for our ancestors which is reinforced by having all the research on paper and isolated from all the other family members who might be working on the same problem. If we work online in a collaborative environment, like the Family Tree, then we have a much greater possiblitity of finding other relatives with more information than we have been able to find.

There are obvious advantages to working on electronic devices and most of these advantages are only apparent to those who have integrated electronic devices into their research. It is a misconception to think that a paper based system of research is somehow superior to a digital one.

See the first part of this series here:

Tuesday, October 25, 2016

The End of Privacy From a Genealogical Perspective

Recently, there have been several huge internet hacking events. Hacking of emails has even made it into U.S. national politics. Whether we like it or not, genealogists are included as the victims of this nefarious online activity. The compromised information includes everything from bank records to credit card and social security numbers. What is even more disconcerting are the revelations of government surveillance of Google and Yahoo accounts and even telephone records for ordinary U.S. citizens.

In a Pew Research Center report entitled "The state of privacy in post-Snowden America," those surveyed indicated that they had taken the following actions to avoid surveillance:
Some 86% of internet users have taken steps online to remove or mask their digital footprints, but many say they would like to do more or are unaware of tools they could use. The actions that users have taken range from clearing cookies to encrypting their email, from avoiding using their name to using virtual networks that mask their internet protocol (IP) address. And 55% of internet users have taken steps to avoid observation by specific people, organizations or the government. Many say the purpose of their attempted anonymity is to avoid “social surveillance” by friends and colleagues, rather than the government or law enforcement. 
At the same time, many express a desire to take additional steps to protect their data online. When asked if they feel as though their own efforts to protect the privacy of their personal information online are sufficient, 61% say they feel they “would like to do more,” while 37% say they “already do enough.” Even after news broke about the NSA surveillance programs, few Americans took sophisticated steps to protect their data, and many were unaware of robust actions they could take to hide their online activities. Some 34% of those who said they were aware of the NSA surveillance programs in a July 2013 survey (30% of all adults) had taken at least one step to hide or shield their information from the government. But most of those actions were simple steps, such as changing their privacy settings on social media or avoiding certain apps, rather than tools like email encryption programs, “don’t track” plug-ins for browsers or anonymity software.
 These types of actions show an innate distrust of many online companies and government activities. As genealogists, some of this distrust spills over and is applied to online genealogical programs. The Pew Research Center report also shows that few American adults are confident that their records will remain private and secure. I encounter this distrust and insecurity regularly as I talk to people about putting their family records online. The irony of this situation is that genealogists are concerned about putting their ancestral information online when much of the information they have acquired came from public and very accessible sources such as United States Federal Census Records and vital records.

As a genealogist I would rather not see or have online information about any personal records of living people. The fact that most of the large online genealogy programs make an effort to keep that information private, does not console me.

But what is reality? What is privacy in today's saturated world of communication technology? The answer is very difficult to ascertain. For example, financial information is anything but private if you have a bank account and use credit cards, almost every transaction you make is highly public in nature. There is precious little information about you and your family that cannot be discovered if someone had enough time and money to do so.

Genealogy is not about you. It is about dead people and dead people have no privacy. As long as you don't put any of your own "private" information on a family tree online, you are no better or worse off for the effort from the standpoint of privacy. Genealogy is not a privacy issue. The only real issue out there is the ridiculous and silly continued use of grandparents names or other such information for "security" purposes. Bottom line, genealogy is not a privacy issue.

Monday, October 24, 2016

Denial of Service (DDoS) and Genealogy

Despite our seemingly advanced technology, we can still do very little about the weather and it seems like we are also learning that we can do little about attacks on the internet that disrupt large portions of the county's online activity. As genealogists become more and more dependent on their own electronic devices we are among those thrown out into the rough internet weather along with the rest of the world.

This past week we have suffered through two outages; one extremely local and one wide spread across the United States. First, it is important to understand that neither of these events had anything to do with a loss of our data.

The local event involved a reported upgrade of the firewall protecting the Brigham Young Library the ended up knocking the website off the network but only in the Library and not anyplace else. This was not one of the regular lapses in service we have seen from FamilySearch, this one lasted a day and half and was caused by the internal security concerns of the Library technical staff, or so it was reported. The effect was that a large class got cancelled and we were prevented from providing support using for the duration of the outage. The Library was very quiet for the day and half it took to remedy the situation.

From my own standpoint, I have converted almost all my classes and presentations to previously prepared slide shows either in Apple's Keynote or Microsoft Power Point to avoid the issues that loss of connection to the internet often cause.

The second and more serious outage did not impact our access so much as it could have. The genealogy programs kept operating despite a huge Denial of Service attack on a major internet service provider. As USA Today points out in its article entitled, "Hacked home devices cause massive Internet outage:"
SAN FRANCISCO — Eleven hours after a massive online attack that blocked access to many popular websites, the company under assault has finally restored its service. 
Dyn, a New Hampshire-based company that monitors and routes Internet traffic, was the victim of a massive attack that began at 7:10 a.m. ET Friday morning. The issue kept some users on the East Coast from accessing Twitter, Spotify, Netflix, Amazon, Tumblr, Reddit, PayPal and other sites. 
At 6:17 p.m. ET Friday, Dyn updated its website to say it had resolved the large-scale distributed denial of service attack (DDoS) and service had been restored.
The outage was enabled by the attacker's remote control use of internet connected devices such as alarm systems and  security cameras to flood the websites with millions and millions of signals that effectively disrupt the ability of the website to operate. As individual users of the internet, there is nothing you or I can do to prevent this type of takeover. Understanding how or why this happened is really a lot like the weather. I may understand why I am stranded in the snow or being flooded out, but that knowledge adds nothing to my ability to overcome the problem. Likewise, the technicians' explanations about how such an internet outage occurs, does nothing to keep the same thing from happening again and again.

How does this impact genealogists? Despite the fact that this particular outage did not affect file storage, it does point out the underlying fragile nature of the internet. For some time now, using online storage has been touted as a solution to our backup and file storage issues. As I have in the past stated many times, we always need to be careful to back up our genealogical data to a variety of devices and services. It is still ultimately important to back up you work to external devices such as multiple hard drives or flash memory drives as well as the judicious use of online storage.

By the way, I was out camping in southern Utah during the general outage and was off the network and without mobile phone service at all and didn't learn about the whole problem until we returned to mobile phone coverage. Likewise, unless you happened to be using one of the websites that was knocked off the internet, you probably missed the whole thing. This reminds me of living in the low deserts of Arizona where it could be raining on one side of the street and not the other.

Sunday, October 23, 2016

Ignoring the Online Miracle with Misconceptions: Part One - The Large Genealogy Websites

For a genealogist who started over thirty years ago in a world dominated by paper, the obvious advantages of online genealogy seem positively miraculous. I can now do research in minutes that used to take a special trip to Salt Lake City, Utah to visit the Family History Library. Libraries still dominate my life, but even in the library I use a computer more than anything else to teach, research and learn.

Some of the greatest online resources are the huge online genealogy programs. Six of these large, online database programs have joined in a mutually beneficial partnership. has opened part of its vast collection of records to these partners in exchange for the members of The Church of Jesus Christ of Latter-day Saints receiving free memberships in each of these vast resources. But I was aware of or using them before the partnership was implemented.

Interestingly, I constantly receive feedback from genealogists who are either unaware of even the large database websites or simply not interested enough to become informed about the records available online at all. There are several very common and persistent false assumptions out there that I seem to hear regularly. The danger in pointing out any general misconception is that those who have the misconception don't recognize that they have it and those who don't have it can't believe that anyone would have that misconception.

For the purpose of this post, I am confining my comments to the following websites:
I am not including subsidiary websites such as which is owned by Ancestry or large included websites such as which is included in and

From my perspective, the most common misconception concerns thinking that the large online database websites all have the same resources. Granted, there is some overlap such as the United States Federal Census Records, but each of these large database programs have unique resources that are not duplicated by any other genealogy website. One way to begin to understand this misconception is to routinely do Google searches on the names of the database programs on each website. It is also entirely possible that one website has an index of a collection that is completely imaged on another website. A superficial search of any of the listed websites will show some overlap, but the real question is whether or not they have the documents you need to find your own ancestors and if you fail to search the larger websites for whatever reason, you are simply operating under a misconception.

Here is where we get into the issue of defining the contents of these websites in different ways. All of the larger websites use different terms to describe their holdings. These terms, such as records, collections, documents, individuals, etc. are arbitrary and have different definitions on each website.

Let me start with an example from As is the case with each of these websites, the key to identifying the contents of their holdings is their catalog of all of their collections. The Catalog is as complete a listing of the records and other publications held by FamilySearch as is available. My example is a book about my Great-grandfather, Henry Martin Tanner. Here is the book from the catalog:

Since this book is now digitized and online on, the book is no longer available in paper format in the Family History Library.

This book is not available from Google books and is not found any other place online and especially not found in any of the other large genealogy websites.

Here is another example of a unique database this time on The database is called the
"Great Britain, Atlas and Index of Parish Registers" and it is based, in part, on "The Phillimore Atlas and Index of Parish Registers." This extremely useful database contains maps of the English parishes and the date of the earliest registers in each parish. 

I could go on and on with examples of unique databases or collections as they are usually called. However two more examples will probably be enough to illustrate that those who have the misconception that these large online websites all have the same information is a gross misconception. My last examples on this topic are's vast Books and Publications database with 447,870 completely searchable items and's even more extensive collection of 725,872 old books from around the world also completely searchable. 

Stay tuned

Friday, October 21, 2016

LeVar Burton - RootsTech Keynote Speaker Announced

FamilySearch seems to be able to get some extraordinary people for the Keynotes at RootsTech and RootsTech 2017 is no exception. The Friday, February Keynote will be given by LeVar Burton from Reading Rainbow, Star Trek the Next Generation, and the impactful 1970s television miniseries that launched his career, Roots. Here is the announcement:
RootsTech is thrilled to announce LeVar Burton as a featured keynote speaker on Friday, February 10, 2017. This celebrated individual has touched countless lives in his various roles as actor, director, producer, writer, and speaker. His passion for literacy, storytelling, and imagination has generated millions of fans throughout the world. Come see why his inspiring life experiences make him a perfect match for the RootsTech stage.
We are starting to ramp up for the upcoming RootsTech 2017 on February 8th through the 11th, 2017.  Here is some more information about LeVar Burton.
LeVar Burton is known by millions as the face of Reading Rainbow, the beloved, long running PBS children’s television series, and for his role as Geordi La Forge, Chief Engineer in the iconic Star Trek: The Next Generation series. Many also remember seeing his talent debut in 1977 when he was cast in the groundbreaking role of Kunta Kinte in the landmark television miniseries Roots. In addition to his familiar talent as an actor, he is an accomplished director, producer, writer, speaker, educator, and entrepreneur. He is the honored recipient of 12 Emmy Awards, a Grammy and five NAACP Awards. Burton has experienced continual success in his innovative efforts to promote his passion for literature, storytelling, and imagination. He is the Co-Founder and Curator-in-Chief of RRKIDZ, the online home of Reading Rainbow and Skybrary. With millions of fans throughout the world, he continues his mission to inspire, entertain and educate.
If you haven't registered, it is time to do so. Also, RootsTech 2017 starts with two break-out sessions on Wednesday, February 8, 2017 so plan accordingly. Most of the hotels are close to being sold out also.

Thursday, October 20, 2016

The Copyright Boat Anchor on Creativity and Research

Article I, Section 8, Clause 8 of the United States Constitution, known as the Copyright Clause, empowers the United States Congress:
To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.
This particular two line sentence in the U.S. Constitution has been turned into morass of complex regulations and interpretations of those regulations numbering into the thousands of pages. Originally styled to protect authors and inventors, the United States Copyright Law now primarily protects publishers and aggregators that take advantage of the extended terms of copyright protection to take advantage of the authors and make money through contractual limitations that go far beyond the original intent of the Constitution.

Presently, under the Sonny Bono Copyright Term Extension Act of 1998. This legislation lengthens copyrights for works created on or after January 1, 1978 to “life of the author plus 70 years,” and extends copyrights for corporate works to 95 years from the year of first publication, or 120 years from the year of creation, whichever expires first. The reason why these extensions were passed by the U.S. Legislature was that Walt Disney's Mickey Mouse was in danger of passing into the public domain. With the new laws, Walt Disney now has an extension to 2023 and enough time to seek further extensions to the copyright act. These extensions were not created to protect authors, but to protect the United States Balance of Payments. See Copyright Extension. See also, "When is 1923 Going to Arrive and Other Complications of the U.S. Public Domain." If you think you understand copyright law, I suggest you read this article carefully.

Determining when and if the copyright on any particular work has expired has become a complicated and almost impenetrable morass of changing laws and regulations, not to mention thousands of court cases that have interpreted those same confusing rules. Some large aggregator companies have used this intense legal confusion to create large digitized collections of books and other material and under a claim of copyright protection have contractually limited the use of the items even when some are clearly in the public domain. Essentially, these companies are circumventing the law by claiming that they have the ability to enforce a contract with those who agree to their provisions. I seriously doubt that these companies have negotiated a contract with each of the actual copyright holders of all of the thousands and perhaps millions of books that they include under their contractual umbrella. It is now common to claim that "copyright law was supposed to prevent publishers from literally stealing each other's business." See comments to "Is it legal to scan a book you own to create an ebook for your own personal use?" I suggest that this commentator and all those others who believe that publishers were intended to be protected read Article 1, Section 8 quoted above. 

In an attempt to get some control over the interpretation of the laws and regulations, Cornell University has since 1999 published a summary of the current status of the copyright laws called "Copyright Term and the Public Domain in the United States 1 January 2016."  If you examine this chart carefully, you can see that certain works, those unpublished works when the death date of the author is not known, are protected as far back as 1896, however, most works registered or first published before 1923 have now passed into the public domain.

Copyright law in the United States and most of the rest of the world is a good idea that has been twisted and changed until the original concept is hardly recognizable. Copyright reform on a large scale is long overdue. 

Wednesday, October 19, 2016

How many books have been digitized?

I am always interested in the disparity between our perception of reality and what is actually happening in the world around us. Genealogists are no exception to this perceptual myopia. Genealogists' main activity is discovering records about their ancestors and relatives. From this, you might expect that they would have an active interest in finding these valuable records. One reality is that these valuable records of our collective past history are in published books. As I have observed in previous writings, from my own observations, very few genealogists are aware of the trove of genealogical books even when they are sitting in a large library.

If some lucky genealogist happens to stumble across a book, such as a surname books written about an ancestor, they then likely become convinced that the book is absolutely true and start adding everything in the book into their family tree, but that is another issue.

One of the main limitations with books has always been finding them. Libraries are a wonderful place to explore the world of books, but you do have to go to the library and spend time looking. Many of us have extensive experience in local, public libraries. Unfortunately, very few of these local institutions have many books that are helpful to genealogical research. If your family happens to be from the area where the library is located, the library may have some extremely valuable items for research but generally, a smaller, local library will have a few of the more "popular" books on genealogy and little else.

Please do not misunderstand me, local libraries usually have specific research opportunities in the form of locally donated books. They may also have other donated or accumulated items of interest including newspaper collections and local memorabilia. But they have limited space and limited book collections.

Larger libraries with huge book collections and vast research opportunities are primarily either in large cities or associated with larger colleges and universities. The fact that they are destination research centers makes their use primarily limited to the serious researcher.

Now we come to the impact of digitization. Now, for all practical purposes, anyone with a connection to the internet can access millions upon millions of books that include overwhelming number of genealogically relevant items. The main challenge with this monumental digitization effort is that the digital books, also called ebooks, are scattered all over the internet in thousands of different websites. Determining whether or not a particular book can be accessed on the internet in digital format can be a daunting research task.

Copyright law in the United States and elsewhere imposes a really strange and daunting limitation on research. I can go into a physical library and look at any book that is available regardless of the copyright status of the book. I do not have to know the copyright status of the book to check it out of the library. But digital books are viewed as a threat to the publishing industry and so copyrighted digital material is highly regulated as you can see anytime you rent a video and have to read the "FBI Warning." So, I can find a particular copyright protected book online and see that there is a digital copy available, but only under some very restricted circumstances can I actually read the digital copy of the book even though I could visit the library and read the physical copy of the book without that same limitation.

This restriction is slowly being eroded by digital libraries such as, but as yet, these online lending library arrangement contain very, very few books of research interest. Fortunately for genealogists, many of the books we find valuable for research purposes are in the public domain, so these books are more generally available online.

So how many books have now been digitized and where are they? That is the question. It is only through exceptionally diligent online research that anyone can find relevant digital books that are freely available. Some websites even restrict the use of public domain digital material as if they had ownership rights. For example, the Brigham Young University has a huge online collection of digital books numbering in the millions of volumes but access to the collection is limited to students, faculty and some staff members only. The books cannot even be researched on a limited basis by non-students. The irony of this situation is that if the BYU library were part of the organization, the public domain portion of their collection would be freely available online to anyone who was interested. What is even more interesting about this situation is that many much smaller and less important university libraries are active participants in the organization. See the HathiTrust Partnership Community.

My example of the Brigham Young University Library is just one of many examples of the spotty availability of digital books. One institution may make a given book freely available while the same book is classified in a restricted section of another website.

The question of numbers is really nearly impossible to answer. For example, Google Books has millions of digital books online but does not publish the total number anyplace that is discoverable. Some websites provide a number but the manner in which individual items are counted differs dramatically from website to website so an accurate count is impossible. All I can really say is that there are millions upon millions of books available online and that perhaps a subset of millions of those books have genealogical interest. I can also only say that as genealogists we need to remember to include detailed online book searches in all our general research efforts. The days of relying on local and larger libraries for this material are over. We still need to go to libraries for the yet-to-be-digitized items, but we can access so much online now that we should focus our initial efforts on online sources.

Tuesday, October 18, 2016

Update on Preserve the Pensions Project

The War of 1812 Pension Digitization Project has raised over $3 million in an online crowdsourcing effort sponsored by the National Archives,, and with extensive participation from genealogical community and the Federation of Genealogical Societies (FGS). As prominently mentioned on the Preserve the Pensions website, as these valuable historical documents are digitized, they will be made available to all at no cost, and the original pension files can be retired to much less active use. So far, over 4 million images have been preserved.

FamilySearch volunteers are also matching the entries in the Pension Files to individuals in the Family Tree and in the future those individual entries will be indicated as being in the Pension Files.

Here is a screenshot of the free image link on

Monday, October 17, 2016

The Vanished Collection Reappears on Fold3

I recently wrote about a collection of Arizona World War II Draft Registration cards that appeared in the Historical Record Collections and then disappeared without any notice. Well, apparently, there was some negotiation going on because the collection showed up on as a "newly" added collection. No problem for me, I have access to both programs.

But perhaps this is a word to the wise that collections may end up moving from a "free" program to a subscription program without notice. I suppose this is good reason for making digital copies of the documents to your own computer database rather than relying entirely on online sources for storage. Don't get me wrong, I think moving as much as possible to online programs is a good idea but now I will try to be more careful to also keep a copy of all that I find on my own computer.

Moving Beyond Census Records: Part Nine: Finding Both the Living and the Dead

For reference, the forms and question lists for each of the remaining United States Federal Census years is as follows.

1920 Census

1930 Census

1940 Census

In each census year from 1920 to the present, the questions including the supplemental questions asked have become more detailed and extensive. By the 1940 Census there were 50 questions. However, some of the genealogically important questions, such as those pertaining to marriage, were asked of only a statically significant portion of the U.S. population and therefore become far less useful for genealogists. In addition, the number of readily accessible online public records has increased dramatically making the identification and location of individuals much easier since the 1930s.

For example, in the past few years, the large major online genealogy websites have included hundreds of millions of public records. Here are some of the examples of such records that have become readily available and completely searchable. It should be noted that these websites include information on people who are still living as well as those who are dead.

United States Public Records, 1970-2009
This collection is an index of names, birthdates, addresses, phone numbers, and possible relatives of people who reside in the United States between 1970 and 2009, although there are a few records outside this range. These records were generated from telephone directories, property tax assessments, credit applications, and other records available to the public.

U.S. Public Records Index, 1950-1993, Volumes 1 and 2
The U.S. Public Records Index is a compilation of various public records spanning all 50 states in the United States from 1950 to 1993. Entries in this index may contain the following information: name, street or mailing address, telephone number, birth date or birth year.

U.S. Public Records Index 1970-2010
This collection is an index of names, addresses, phone numbers, and possible relatives of people who resided in the United States between 1970 and 2010. These records were generated from telephone directories, property tax assessments, credit applications, voter registration lists and other records available to the public.

In addition, there are hundreds of other websites, mainly commercial websites, that can assist a researcher in finding and identifying even living people in the United States. Many of these commercial websites are involved is what is commonly referred to as "skip-trace" information. That is information about people who have "disappeared" usually because of debt collection efforts by debt collection agencies. There is a major industry that has developed in the United States helping creditors, attorneys and financial institutions to locate and identify people. 

Many genealogical researchers who are unfamiliar with the number and availability of public records about living people are both surprised and appalled at the detail of these records that has become immediately available for a small fee. Only the most sophisticated and diligent people can avoid being readily identifiable. The publically available information readily available to these companies includes details of people's lives that go well beyond the limited questions in the U.S. Census records. 

The largest of these online commercial databases are probably and Both of these very large companies have a web of subsidiaries for locating and providing information about living individuals. These services now come under the euphemism of "risk management" services. 

It may not be obvious to some, but one of the best methods to find people in the United States is through city directories and telephone books. All of the large online database companies have at least some of these directories incorporated into their collections. But there are other large online collections of such records. These can be found by searching for some combination of the words "digital historical city directories" or "digital online telephone directories." Some, perhaps most, of the online telephone directories are either pay-per-view or subscription services. 

Although the U.S. Census records are one of the most valuable collections of records in the country, they are just the barest of beginnings for doing adequate genealogical research. It is extremely important to use additional records to supplement the initial findings in census records both to correct errors in the census records and to expand the information found and fully document the lives of our ancestors.

Previous posts in this series.

Guidelines for Digitizing Manuscripts and Photos

Genealogists have benefited from the proliferation of digital images by the billions of such images have gone online of historically and genealogically significant documents and records. However, individual efforts to digitize our own collections sometimes result in less than acceptable images. Obviously, the genealogist who digitizes a family record or photograph is limited by the quality of the original, but there are definite and widely available guidelines for the best practices in digitizing both from an institutional and personal standpoint.

I have compiled a list of websites that address the issues of providing guidelines for digital collections. If you examine some or all of these websites, you will see that they come from a variety of disciplines. Some come from the academic community and others are more commercial in nature, but they all provide their unique perspective on the best practices and most agree in the basic methods. You might also observe that the recommended practices have changed somewhat over the past few years as technology has become more advanced.

Here is the list.

“Best Practices and Planning for Digitization Projects,” April 15, 2013.
“Best Practices for Digitization | Minnesota Digital Library.” Accessed October 17, 2016.
“cdl_gdi_v2.pdf.” Accessed October 17, 2016.
“Digital File Creation: Standards and Best Practices for Saving Images [Tutorial] | The Sustainable Heritage Network.” Accessed October 17, 2016.
“Good and Best Practices for Making Digital Images.” Accessed October 17, 2016.
“Guidelines for Best Practices in Image Processing.” Accessed October 17, 2016.
“Guidelines_for_images.pdf.” Accessed October 17, 2016.
“hal_mhc_rms_bp_for_digitizing_125527_7.pdf.” Accessed October 17, 2016.
“IDA_Best_Practices.pdf.” Accessed October 17, 2016.
“Images-Best_Practice.pdf.” Accessed October 17, 2016.;jsessionid=F66E09F9FDA601BA08A5543CF33CDD61?sequence=1.
“Best Practices.” Accessed October 17, 2016.

National Efforts Directed at Digitizing Records and Documents

The U.S. National Historical Publications and Records Commission (NHPRC) of the U.S. National Archives and Records Administration "supports a wide range of activities to preserve, publish, and encourage the use of documentary sources, created in every medium ranging from quill pen to computer, relating to the history of the United States." The NHPRC supports major archive initiatives to digitize historically significant collections and make them freely available online. Quoting from the NHPRC's Facebook page:
We have a new grant program that may interest you. The program offers up to $350,000 for major archives initiatives with an emphasis on innovation and collaboration. The new Access to Historical Records – Major Initiatives program is designed to broaden public access to historical and cultural records. There’s a five-page preliminary proposal due by 19 January 2017. The Commission will then invite a select number of applicants to submit a full proposal. 
Does your institution need to conjoin the records of a major historical subject held by several repositories and make them freely available online? Does research demand for a high-value audio or moving image recordings collection necessitate digitally converting and posting them online? Are there new tools and methods that would greatly enhance the public’s ability to access and use records? Have you begun developing a method to make work with born digital records more efficient and want to prove that method is replicable?

These are just a few suggestions. We want to hear all your creative ideas and discuss how they might fit with this program. If you would like to schedule a time to talk about a proposal idea, please email or call the Director for Access, Alex Lorch ( or the Director for Technology Initiatives, Nancy Melley (
In pursuing its goals, the NHPRC partners with a members of the national historical and archivist community.

Genealogists need to become aware that their goals in documenting individuals and families falls squarely into the overall interests of those who wish to preserve all important historical documents. Because this is the case, genealogists would do well to support a broader range of historically significant digitizations projects.

The historical development of genealogy in the United States has created a somewhat artificial division between "history" and "genealogy." However, the only bachelor level university genealogical degree programs in the United States at Brigham Young University is part of the College of Family, Home, and Social Sciences, Department of History. But despite this inclusion, many historians do not consider genealogy to be a "serious" academic discipline. This is due, in part, to the generalization of the pursuit and the participation of many who are untrained in basic historical research.

I believe it is important for all genealogists to be more fully aware of both local and national efforts to digitize historically significant records and support such efforts. The benefit to our genealogical community will be the increased availability of online genealogically significant collections.

Sunday, October 16, 2016

Moving Beyond Census Records: Part Eight

While U.S. Census records are extremely popular among genealogists because they are easy to find and use, they still have a number of serious limitations. Some of these limitations are shared with all historical records but because census records are so complete and useful, genealogists have a tendency to ignore their limitations. 

Census records have some major limitations. A more extensive discussion of some of these limitations can be found in the following book:

Greenwood, Val D. 1990. The researcher's guide to American genealogy. Baltimore, Md: Genealogical Pub. Co.

Here are some of the limitations:
  • The people supplying the information are relying on memory and/or may not know the answers to the questions
  • Some of the information may be misrepresented
  • Some of the entries are incomplete
  • The census records are recorded only every ten years and with the missing information from the 1890 Census, the record likely omits some pertinent information such as children who are born and die within the ten year gap
  • Before 1850, the record is difficult to interpret and possibly misleading
  • The entire time covered by the census records only extends back a few generations in a limited area
As genealogical researchers we need to apply the same level of analysis and evaluation to census records as we would to any other record. The information supplied by any one census record should never just be accepted as correct on its face. Experience in examining census records show many instances where the information supplied is contradicted by subsequent censuses.

The 1910 U.S. Federal Census was on one page, even though the readable form shows two separate pages. The 1910 Census also had questions direct at American Indians (Native Americans). Here are the questions asked including those directed at the Indians.
  1. Number of dwelling house in order of enumeration
  2. Number of family in order of enumeration
  3. Name
  4. Relationship to head of the family
  5. Sex
  6. Color or Race: Enumerators were to enter "W" for White, "B" for Black, "Mu" for mulatto, "Ch" for Chinese, "Jp" for Japanese, "In" for American Indian, or "Ot" for other races.
  7. Age
  8. Is the person single, married, widowed, or divorced? Enumerators were to enter "S" for single, "Wd" for widowed, "D" for divorced, "M1" for married persons in their first marriage, and "M2" for those married persons in their second or subsequent marriage.
  9. Number of years of present marriage
  10. How many children is the person the mother of?
  11. Of the children a person has mothered, how many are still alive?
  12. Place of birth of the person
  13. Place of birth of the person's father
  14. Place of birth of the person's mother
  15. Year of immigration to the United States
  16. Is the person naturalized or an alien?
  17. Can the person speak English? If not, what language does the person speak?
  18. The person's trade, profession, or occupation
  19. General nature of the industry, business, or establishment in which this person works
  20. Is the person an employer, employee, or working on his own account?
  21. If the person is an employee, was he out of work on April 15, 1910?
  22. If the person is an employee, what is the number of weeks he was out of work in 1909?
  23. Can the person read?
  24. Can the person write?
  25. Has the person attended school at any time since September, 1909?
  26. Is the person's home owned or rented?
  27. Is the person's home owned free or mortgaged?
  28. Does the person reside in a home or on a farm?
  29. If on a farm, what is the farm's identification number on the census farm schedule?
  30. Is the person a survivor of the Union or Confederate Army or Navy?
  31. Is the person blind in both eyes?
  32. Is the person deaf and dumb?
  33. Tribe of this person
  34. Tribe of this person's father
  35. Tribe of this person's mother
  36. Proportion of this person's lineage that is American Indian
  37. Proportion of this person's lineage that is white
  38. Proportion of this person's lineage that is black
  39. Number of times married
  40. Is this person living in polygamy?
  41. If this person is living in polygamy, are his wives sisters?
  42. If this person graduated from an educational institution, which one?
  43. Is this person a taxed? An American Indian was considered "taxed" if he or she was detached from his or her tribe and was living in the white community and subject to general taxation, or had been allotted land by the federal government and thus acquired citizenship.
  44. If this person had received an allotment of land from the government, what was the year of that allotment?
  45. Is this person residing on his or her own land?
  46. Is this person living in a "civilized" or "aboriginal" dwelling? Enumerators were to mark "Civ." (for "civilized") if the person was living in a log, frame, brick, or stone house, etc. and "Abor." (for "aboriginal") if the person was living in a tent, tepee, cliff dwelling, etc.
The questions regarding marriage and birth of children should be a interest to genealogists. These questions have been used extensively to discover other marriages and children missing from the family. As in previous years, questions concerning where and how the people lived strongly suggest additional records that could be searched.

A positive answer to question number 30 would suggest that further research should be done to locate military records and possible pension records. The questions about immigration and naturalization are valuable in helping researchers to find additional records such as naturalization documents and passenger lists.

It is also important to note any inconsistencies between the answers given in prior census years. Changes in ages and birth dates are common, but other changes may give clues to important information that may have been intentionally omitted in prior censuses.


Previous posts in this series.