Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, July 17, 2014

Web Basics for Genealogists -- Part One

For all practical purposes, the Web is infinite for any particular user. At the time of this post, the World Wide Web (www aka the Web) is estimated to have 3.32 billion pages with an estimated 861 million websites. These numbers are estimates because no one really knows any exact numbers. What does this mean to a genealogist? If you were to take one second to look at each webpage for 24 hours a day, seven days a week, it would take you over 105 years to look at the existing pages. In addition, new pages and websites are being added by the millions every year so even if you started right now to look at all those pages, you would be losing ground every second of day and likely, at the end of your search, there will be more pages than when you started.

Essentially, no one can really know the extent of the information available on the Web. But before I go on to discuss the implications of these facts to genealogists, we need to understand a little bit about the Web.

First, the Web and the Internet are not the same thing. Although it is common to use the terms interchangeably, the Internet is the system of physical computers and connections. According to Netcraft.com there are 958,919,789 servers on the Internet as of April, 2014. It is difficult to craft a definition of the World Wide Web without using circular terms. The Web is a system of interlinked hypertext documents that are accessed via the Internet. See Wikipedia: World Wide Web. Users access the Web using a program on their device called a browser.

So, when you use your computer or other device to access the Web, you are using a computer program called a browser to look at pages created with hyperlinks. From a practical standpoint, you need to know that there are two major types of webpages: static and dynamic. A static webpage is like a text document. I does not change when you look at it. This blog post is a static document on the Web. A dynamic webpage is one that is created at the time it is viewed. When you look something up in a catalog or search on FamilySearch.org or Ancestry.com, that content is generated at the time you view it, hence it is dynamic.

Static pages are searchable by a program called a search engine. Google Search is a search engine. Unfortunately for the user, search engines can only search static webpages. Most dynamic content is only searchable from the hosting website. So all of the content of the databases on Ancestry.com are essentially invisible to a Google search. You have to go to Ancestry.com and look at the pages.

Let me give you an example. Let's suppose you are looking for someone in the 1930 U.S. Census. If you look for that document on Google, you will likely not find the entry unless someone has intentionally copied the information onto a static webpage. In addition, the original Census document is preserved as a series of images. Unless someone has indexed those images, you would have to search the Census page by page and line by line by looking at the individual images. Ancestry.com and FamilySearch.org have indexes of the U.S. Census records as do many other online programs. You have to go to each of those hosting websites and use their catalog program to find the information you are seeking.

There are a number of inferences that come from these facts. Here is my summary:

  • You can never be absolutely sure that you have made a complete search of all possible genealogical resources on the Web.
  • Even if you were to search constantly, day and night, you will never be able to find and use all of the resources on the Web. 
  • Even though genealogically valuable records are only a tiny fraction of the total amount of information on the Web, there is still much more than any one person can comprehend.

Now, before you get terminally discouraged, you need to know that this has always been the case. Most of the records becoming available on the Web were already in existence before the Internet or Web were ever in existence. The advantage of the Web is that almost all those millions, upon millions of existing records were very much less available until they were digitized and put on the Web. Also, there are many records left to be added to the Web.

So, no one should ever say that they have looked everywhere. To do so is a physical impossibility. How can you ever be completely sure that in those billions of pages the information you are seeking is not hiding?

Next time, searching the Web.

1 comment:

  1. James Tanner, you mention: "How can you ever be completely sure that in those billions of pages the information you are seeking is not hiding?" Never, at present. However, by the combined efforts of many librarians, there is some hope for the future in indexing & LibGuides. I note:

    Internet Indexing Library Collections - Systems:
    Family & Local History - Genealogy
    https://www.facebook.com/notes/family-genealogy-and-history-internet-education-directory-wiki/internet-indexing-library-collections-systems-family-local-history-genealogy/10152217146556444

    ReplyDelete