Tuesday, November 12, 2013

Understanding Online Searches in Depth -- Part Two

There are many different challenges to finding what you are searching for online. To a greater extent than usually realized, finding your ancestral history depends on your ability to use both standard library research techniques and poorly understood and only recently developing online research methods. I find that knowledge of both by genealogical researchers is usually found in only the most rudimentary form. In addressing this issue, my objective is to make a few more members of the genealogical research community aware of the existence of the issues, not solve them for the greater genealogical research population.

In a traditional library research setting, your research, in part, consists of identifying materials pertinent to your search by relying on catalogs and direct examination of the "shelves." Unless you are intimately aware of library operation, you probably have only the vaguest idea about how this process works. Library catalogers work in a highly specialized and very complex environment. They must examine newly acquired books and decide on which of the thousands of categories to assign depending on their particular library's policies. Since 1901, most published books come with an "LC" or "OCLC" number. This means that the book has already been cataloged by either the Library of Congress or the Online Computer Library Catalog (OCLC) aka as For a brief description of the history of cataloging see Cataloging Rules for the 20th Century, D-Lib Magazine, January/February 2007, Volume 13, Number 1/2, ISSN 1082-9873.

Here are a few reference links to give you an idea of the complexity of the cataloging world:

Let's just say that understanding and using a library classification system is a lot more complicated than simply looking something up in an online library catalog. The main challenge is identifying the research material containing the information you need. Catalogers are not just recording what is contained in the book or other material, they are adding in the human analyzed content that makes finding the content possible. As I watch the patrons at the Mesa FamilySearch Library, for example, I see nearly every one of them go directly to the computers to find information about their families when there are thousands of books on shelves only a short few steps away from the computers. The books remain almost unused except by the very few educated and/or experienced researchers who understand how and where to find the information they are seeking. Most of the researchers have no clue where the information might be found and instead of searching for references, they begin an automatic search for names and dates, neither of which are ultimately productive. By using a cataloging system, such as the elaborate ones developed for the world's libraries, you gain generalized subject areas that can apply to your search. In short, you do not limit your options for finding your ancestors, you expand those options by understanding the vast resources that are available. 

Now, it is time to begin to address computer searches. When you are searching on a computer you are at the mercy of the search algorithms developed by the software engineers who built the search engine. In effect, you are relying on Google's (or whomever) programmers to give you access to research materials. Computerized searches rely on "every word" indexes. In other words, the computer programs read every word in a book and then let you search based on an index of every single word in the book or other material. You might immediately think that this method of searching is both more efficient and accurate than a library cataloging system, but you would be wrong. In fact, it makes searching more difficult and less productive. The reason for this is that the "all word" search obscures relationships between reference materials that at available through cataloging systems. The problem arises because of the unpredictability of the way the information you are searching for may be expressed. If you would like some insight into Google's processes, see Inside Search

Now, what I am at a major juncture in the understanding of online searches. We have two major divisions in the research process; cataloged materials in traditional library settings and uncataloged materials in huge online "every word" search engines. We also have a situation where the average genealogical research has little or no understanding of either system. How do we move forward? That will have to be the subject of yet another post in this series. 

