Wednesday, July 23, 2014

Web Basics for Genealogists -- Part Two Beginning our Understanding of Searches

Because of technological advances, one of genealogists' most important activities has become searching for sources online. Using a computer to search online involves a number of complex skills. Unfortunately, almost all genealogical researchers are literally on their own in learning any of these skill. Unless the researcher just happens to have a technological background coupled with knowledge of library science or some other information science experience, it is unlikely that their online searching is either very productive or pleasant.

The number and variety of "educational opportunities" are overwhelming, starting with classes on computers at local colleges and universities, but how many genealogists spend the time obtain a degree in computer science or information science before starting out to do research on their family?

Basic computer skills involve the physical mechanics of entering data using a keyboard and mouse to understanding file structure and the operation of complex programs. But even with a good background in computer usage, it is a fact of life that the technology changes constantly. So the today's genealogist is confronted with learning about computers while trying to understand the equally complex field of genealogy. As a side note, many people involved in genealogy assume that younger people, who have grown up using computers and cell phones are a "step ahead" in entering the field of genealogy because of their background in technology. This is an illusion. Genealogical research requires additional skills of analysis and evaluation that are gained only by experience. It may be discouraging to the beginner, but learning computer skills is only the first step in doing effective genealogical research using all of the vast online sources.

I am going to have to assume that the readers of this blog post have at least a basic idea about how to use a computer or other computer-based device or they would not have gotten to this venue. This particular post is called Web basics because I find that even with good computer skills, researchers are not aware of the different ways you need to conduct searches online.

There are three basically different online search techniques that reflect three completely different ways of organizing information. Like it or not, as genealogists we are involved in the analysis, collection, classification, manipulation, storage, retrieval, movement, and dissemination of information. But to perform any of those activities, we have to first find the information. Following is a short analysis of the three different methods of approaching the finding function of online research.

You can think of research in the abstract as searching through an infinitely large pile of paper. Each piece of paper has a small piece of information. If you were to sit by the side of the pile and randomly pull out pieces of paper, what would be the chance that you would find what you were looking for? My guess is that the probability of finding what you want is close to zero. What is more, how do you know what you are looking for is even in the pile? Genealogists should be painfully aware that not all the information they need has yet been transferred to the vast online pile.

So ignoring the three different search techniques for a while, we should also have a basic idea of the types of records we are searching for and whether or not the particular types we need have migrated to the Web, that is been digitized and indexed. Hmm. That brings up another issue. Genealogical information may be on the web as images of documents. Unfortunately, the technology for searching images of documents is sadly very rudimentary. So as genealogists we rely heavily on indexing and indexes. Even with all our vast electronic wonders, we still have to rely on someone, someplace looking at each document image and manually transcribing the information. Of course, if the information we seek is text, it is much easier to find and search. But if the information is locked up in an image, we are back to visually searching the records which is no different than going to a library or searching through microfilm copies.

 Now back to the infinite pile. We all seem to instinctively understand that the pile needs to be organized in some way so that we can find what we are looking for. But how do we organize the pile? Well, librarians have been organizing their piles for quite a long time. They use a variety of complex cataloging systems. As children going to a school library, we probably heard of the "Dewey Decimal System" or organization and the corresponding card catalog. Books were (and still are in some libraries) organized on shelves by subject and then numbered in a way to make it easier to find the books. For genealogists this is an awkward system because almost everything ends up in Dewey Decimal Classification number 929. Here is a list of categories:
929.1 Genealogy
929.2 Family Histories
929.3 Genealogical sources
929.4 Personal names
929.5 Cemetery records
929.6 Heraldry
929.7 Royal houses, peerage, orders of knighthood
929.8 Order, decorations, autographs
929.9 Forms of insignia and identification
You can see that this set of categories is not all that useful. In any event the whole Dewey Decimal System of classification has been supplanted by other more complex cataloging systems such as the Library of Congress Standards. Warning: getting into this area of searching can be very discouraging, as in, I had no idea how complicated this could be. Just for fun, here are the Library of Congress Standards by category:

Resource Description Formats
Digital Library Standards
Information Resource Retrieval Protocols
ISO Standards
  • ISO 639-2: Codes for the representation of names of languages-- Part 2: Alpha-3 code.
  • ISO 639-5: Codes for the representation of names of languages-- Part 5: Alpha-3 code for language families and groups.
  • ISO/DIS 25577 - Information and documentation -- MarcXchange
  • ISO 20775 - Schema for Holdings Information
Metadata for Digital Content: Developing institutional policies and standards at the Library of Congress
Recommended Format Specifications: Best practices for ensuring the preservation of, and long-term access to, the creative output of the national and the world in both analog and digital formats

OK, now you can begin to see the first type of search. That is a search based on a cataloging system developed and imposed on the data pile by someone who makes up the systems. Searching in a catalog is a whole complicated study in itself. I spent my first few years of work as a bibliographer in a major university library. I became very familiar with the complexity of the cataloging systems.

Is there any hope? Sorry. Not much. The second method is the brute strength, bulldozer method called a string search. You can think here of Google. You type in a series of characters and the search engine tries to match your string of characters with any other characters out there in the pile that match. I wish it were just that simple. What really happens is that Google and other such search engines, create their own catalogue or structure of the data before beginning the string search (not string as in tying knots but strings as in a series of text characters). At this point you can probably guess that I am going to write more completely about each type of search but at this point, what you need to know is that you type in a name and the program sees if it can find that name anywhere. Of course, you soon find that the searches return millions of results that simply illustrate the size of the selected pile, so there must be more to searching on Google than simply wishing that your results show up. Yes, there is, but you will have to wait until my subsequent posts.

Last, but certainly not least, computers programmers have come up with an entirely different way of organizing vast quantities of information that they call a wiki. Searching a wiki turns out to be completely different that either a traditional (or even non-traditional) cataloging system and has its unique advantages and some disadvantages.

Perhaps you can now begin to grasp the complexity of the pile of information and the fact that there are different and somewhat complex methods of organizing the piles. As genealogists, I suppose we could blissfully ignore all this and go on our merry ways seeking our ancestors. We might even acquire some or many of the skills necessary over time. But now, we are faced with the huge online world and sitting in a library in Salt Lake City or where ever is not all of the answer to our investigations.

The next posts on this subject will explore each of the three major methods of pile organization and give some ideas of how searches differ or are the same in each method.
