Pages

Thursday, October 2, 2014

Redundancy in Genealogical Databases -- Good or Bad?

I received the following very thought provoking question from a reader:
I was curious if there is a way to figure how many records are redundant among the databases and which records are exclusive to a database?
The issues raised by this seemingly simple question go to the heart of the concept of maintaining a genealogical database online. The basic idea of collecting genealogical source records is that a researcher can "search" the records for information about his or her ancestors. The implication is that the information will lead to the extension of the researcher's pedigree. The researcher must make the assumption that there is something of value in the online collection and that the searches will produce the expected results, i.e. information on the researcher's family.

As the researcher uses the online database, he or she must experience some level of satisfaction with the initial searches or the search will be abandoned. Depending on the level of sophistication and the research skills of the researcher, the initial impressions of the searches will usually determine the researcher's overall impression of the online database. Put is a more direct way, the searches must produce information of value to the researcher or the database will be abandoned. One indicator of the level of sophistication, experience and skill of any researcher is the ability to continue searching when there are no positive results.

Let me give a hypothetical situation for an example, rather than pick on any one real database program. My hypothetical online database is called "The Dreamland of Genealogy" or (DOG). It's advertisements claim it is "where all your genealogical dreams come true." It claims billions, upon billions of records from every corner of the world. OK, now what is reality here? If a potential researcher simply looks at a list of the source documents, what will he or she find?

DOG knows from observation or experience with other databases online, that the first thing a researcher wants is almost instant gratification. So DOG begins by supplying extremely common genealogical resources such as the U.S. Census records, the Social Security Death Index etc. DOG further realizes that these common database components are redundant, that is that every other online database has the same records. But DOG also knows that if a researcher comes to its website and does not find the milk and potatoes of genealogy, the research will not stay to buy anything else. Why do you think that every supermarket in the U.S. sells milk, vegetables, meat etc? A retailer could fashion a "boutique" store that sold only high end products to a select clientele but usually it is only the major retailers that can afford to maintain these "boutique" type stores. We had an experience with this in the Mesa/Phoenix area when a company came in an started a lot of "Fresh and Easy" stores. My wife and I made one visit to one store and that was our last visit. After a couple years, the stores all disappeared at once. Why? Because Walmart started their small "Neighborhood Stores" and positioned them all around the city. The same sort of market forces drive the smaller database companies out of business. Either they end up selling to the larger companies or they disappear.

Now, if you had gone into the smaller, boutique-type store, would you have found milk, vegetables, meat and other "redundant" products? Yes. But the price of those products would have been higher than the larger supermarket chains. In this analogy, the same databases would have been harder to search.

Now to the question presented at the beginning of this post. Can you determine the level of redundancy between databases? Yes, you could if you had access to a way to compare the two websites. But the companies know that if this were something easy to do, then any online competition would devolve into the same level as the supermarkets with their specials on milk and other essential products. So, the online databases have made that job nearly impossible. They do this by the way they count and advertise their product, i.e. genealogy records. They use a number of meaningless terms such as record count, people, profiles, collections, etc. to make it impossible to determine whether the records are the same or different. In some cases, they also make it nearly impossible to tell where the records originated. For example, go onto a large online genealogy company and try and find out where the records came from. You will see what I mean. By this I mean which library, repository, archive etc. supplied the records?

So you can see that the answer to the question above is yes, there is a way to determine the level of redundancy, but the more realistic answer is no, there is really no way to determine the level of redundancy because we do not have access to the necessary information and the program are not about to let us know that information without a struggle.

No comments:

Post a Comment