Some people eat, sleep and chew gum, I do genealogy and write...

Saturday, February 15, 2014

Scattering data across the Web - a problem of consolidation or methodology? Part One

It doesn't take anyone too long after they began doing genealogical research on the Internet to find out that the information they are seeking is scattered in thousands of websites all around the world. For some, this fragmentation of the genealogical sources is an insurmountable obstacle to their research. This abandonment of hope is generally caused by a lack of understanding of the process of finding the sources for records rather than using the record sources themselves. Even if you are immersed in the operation of the Internet, because of the millions upon millions of websites available it seems nearly impossible to find even a small percentage of the useful websites. By necessity, we often rely heavily on search engines to perform this task for us. Unfortunately, in order to do this you have to know what to search for before doing your search. So we have sort of a chicken versus the egg problem. What most researchers fail to understand is that learning to search on the Internet is a learned skill and further, there are a number of underlying principles that cannot be ignored in making progress towards finding your ancestors.

How do you find a source if you don't know it's there?

I would suggest that my first rule regarding searching for any genealogical data whether on the Internet or still located in a paper archive is as follows:
Always assume the record is there.
 Now of course, there are a number of qualifications to this rule. It helps if you have some historical perspective as to the type of records that have been created in any particular area or jurisdiction during any particular time period. For example, many beginning genealogists assumed that finding a date of death involves locating a "death certificate." On the other hand, with a minimal amount of historical investigation or by consulting readily available sources, the researcher can very quickly learn that vital record certificates are relatively recent innovations. So the first qualification of the rule above would be to put the types of records being sought into the historical time period in which they were created in order to ascertain which types of records might reasonably be found. In this case, before you begin searching either for names or record sources, you search for the historical background of the time and place where your ancestors lived.

As you perform this historical survey of the records available at the time and place an event occurred in your ancestor's life, you will discover that there are huge categories of records available about which you were previously unaware. I have written about this particular issue several times in the past.

Now, a further qualification of the above rule is that the "information" you are seeking may be in a completely different type of record than the one you are assuming is available. We generally refer to this issue as locating records that are alternate sources for specific information. For example, you may think of a death certificate as a primary record for determining the date of death. On the other hand, it is also a secondary source for the date of birth. Of course, it may also contain considerable additional information, either primary or secondary. Another example is when the researcher is searching for a specific event and assuming that the record will always be associated directly with that event. For example, searching for a marriage record and assuming that there will be some kind of direct record of the marriage such as a marriage license or marriage certificate. The rule is much broader than this. in assuming that a record is always there, you do not assume that any particular form of the record will be found. The rule merely applies to the proposition that there is always a record available with the information sought, even if it is not the record you are looking for.

Another example, is when you are looking for birth records for children and ultimately locate the information that you need from a probate record. in fact, you may never find a birth record. But that does not mean that the date of the birth of each of the children can be exactly established. It may only be possible to establish a birth date to the year of their birth or even to a range of dates. Ultimately, finding an exact date for the birth or death or marriage of an individual is not as crucial as it may seem.

The rule also recognizes the fact that as you go back in time the number and variety of records continually decreases until you are relying on a very narrow selection. Eventually, as is true in every case record simply run out. This brings us to the second major rule that is:
At some point, the availability of any type of record disappears.
In one sense, this is not really a new rule. It is really merely a limitation on the first rule and the application of the qualification that all records are maintained within the historical context in which they are created. By far, ignorance of the historical context of records in general is a major limiting factor in the ability of any genealogical researcher to find their ancestors.

Now back to the issue of having the data scattered across the web. Because of the nature of the information on the Internet, we are essentially at the mercy of the search engines, i.e. Google. Far too often, I see researchers searching for names and dates when they actually should be searching for records and repositories. It is a basic fact of the Internet that much of the information that can be accessed through the online connections is not readily available and is stored in a variety of off-line servers. For example, the contents of a library's catalog is only available to the extent that the library permits web-based searches. This also applies to all commercial online genealogical database companies such as and Unless you pay your subscription to or search the Historical Record Collection on, you will not have access to search the records in their respective databases. Even though the records they contain are essentially accessible from the web, the records are locked behind the commercial gatekeeper. Remember also, that many of the records you are seeking may be online but either have limited indexes or no indexes at all. A good example of this is the fact that most of the records in the Historical Record Collections are still un-indexed and only available as individually searchable images that will not appear in any Google search or even in a search on You can view an image but the contents, unless indexed, are not searchable.

This brings us to the next rule:
Remember to search for databases and repositories.
Some might believe that consolidating all of the information of relating to the genealogies into one huge mass would be desirable. However what would be the fun in that? Stay tuned for Part Two.


  1. Yes, we need a tool that will bring all this information together for us.

    ... I'm thinking about this.

  2. Also various database-providers have taken to copying others' indexes/extracts. Thus there is no actual way to determine the number of discrete records (so-called; usually meaning 'names' rather than documents) available on the web.

    The FS announcement of intention to make XX billion 'records' available on the web within a generation presumably means ~names~ rather than documents. This also is a number with some built-in faults because there are huge numbers of documents on the FS site alone that will never be completely indexed, such as myriad estate-related documents (lists of heirs in various forms, wills and sale bills, for example). And the IGI baptismal record extracts already had stripped out the names of the Godparents.

    I dread the oncoming aggregation by such providers of others' tree indexes, and fervently hope for built-in ways to eschew such listings.