Pages

Wednesday, September 23, 2015

Plumbing the Depths of Online Record Collections -- Part One

The seminal development of genealogical research the past few years is likely the online, digitized record collection. Some of these collections have grown into massive conglomerations of whatever happened to be available to the company accumulating the records. The whole idea of accumulating vast quantities of digitized records for genealogy only dates back to about 1990s with the founding of a company called Infobases, Inc., the predecessor of Ancestry.com. For a brief history of Ancestry.com, see the article reviewing its development in Wikipedia. By the way, these companies are not particularly interested in giving anyone a detailed history of their development. Corporate secrecy and the resultant carefully crafted press releases usually result in quite vague statements about how these companies have grown so large.

The combined technology that allowed the creation of these huge websites had to wait for the individual growth of a number of related developments. First, there needed to be a cheap and efficient way to convert paper documents into digital image. At the same time, the capabilities of computers and data storage had to increase so that the large digital images created could be viewed. Enough people had to acquire the new technology to create a market for the images and a way had to be developed (the Internet) to disseminate the images to a wide audience. In addition, a commercial, subscription-based company model had to be developed and people induced to pay for the privilege of simply viewing online content. For example, without the inexpensive, large, high resolution, computer monitors (aka televisions) available today, how many of us would be trying to read old documents online? The number of developments that had to come together to produce these large online collections is truly staggering.

At the same time, without the explosive growth of online businesses, there would have been no incentive to produce huge databases collecting original source documents. This puts genealogists in the position of having easy access to billions upon billions of records. Think about the numbers. If you were to look at a billion documents and spend only one second on each document and look 24 hours a day without any breaks, it would take you almost 32 years to look at that many documents but of course, you would not live more than a few weeks at most. Some of the larger companies claim to have many billions of records. So how do we really know what they have?

In past posts I have commented on the fact that none of the larger companies use the same method of measuring their huge collections. In determining if the claims of billions of records are correct, we are quite literally at the mercy of the large companies, their public relations and marketing departments, and their search engines. We can see unimaginably large numbers, but these numbers are essentially meaningless. What am I going to do with 6 billion records? The implicit assumptions made by those who promote the mega-databases is that bigger is always better. Centralization is always considered to be a positive. No one seems to stop to think about how the average person is going to process all that information and no one stops to question the advantages (if any) of centralization.

But if we go behind the facade of big data, we find a different issue. What business are these large online database companies actually in? Each of the large companies now has "strategic partnerships" with other entities and, in the most recent developments, these entities include DNA processing companies. I am very far from being an alarmist conspiracy type person. It is also, arguably, very advantageous to genealogical researchers to have access to this sort of consolidated company, but on the other hand, can these companies continue to expand into related areas and keep providing adequate service in every area? Is larger always better? There is an old saying, "The bigger they are the harder they fall." I am reminded of the recent announcement of Ancestry.com's venture into the "health industry." See health.ancestry.com.

Can we really contemplate what effect it would have on genealogy if one of these larger online database operations were to "go out of business?"

What are the practical realities of a vast centralization of the world's genealogically significant records? We assume that putting all of eggs in one basket is a positive development. Where does that assumption come from? Of course, I personally benefit from easy access to billions of records but I also have acquired the individual skills and have the computer power and equipment to utilize all that information. How many people really have the time, the money, the education, the inclination and the perseverance to use the information already accumulated?

I do not think that these questions involve value judgements. These large accumulations of records are neither "good" nor "bad." I am not addressing these issues from an ethical or value standpoint. I am addressing the issue of size, per se. I appears that this discussion is going to continue in another post or may even end up being a series. See you next time.

If you are interested in doing your own thinking on this subject, I would suggest that you start by reading the following book:

Taleb, Nassim Nicholas. The Black Swan. London: ALLEN LANE, 2011.

No comments:

Post a Comment