When I was practicing law we generated a huge quantity of paper documents. For example let's suppose I got a new client who was involved in litigation. I had to "open a file," that is, create a place to store documents and other items associated with that case. In this example, the "file" was a physical manila file folder. Each file was assigned a number, usually consisting of a year and an accession number (a number assigned to each new file in the order they were created) i.e. 2001-1, 2001-2 etc. We had a completely manual file system. At this point we had a choice; we could organize the files by file number or alphabetically by the surname of the client. We could also separate files into topics, such a probate, litigation, corporate and so forth. A single file could be one piece of paper or a room full of boxes. A single civil litigation case could generate tens of thousands of pages of documents.
As an aside, while I was working at the Arizona State University Law Library as a Reference Librarian, there was a whole room in the library devoted to one lawsuit, Arizona v. California, 373 U.S. 546 (1963).
Notwithstanding our legal file organization, we spent a huge amount of time looking for files and even more time looking for specific documents. Now multiply that organizational problem by millions and trillions of documents spread across the entire world. Presently we have over seven billion people generating billions of documents every day, including this blog post.
Genealogists focus on finding people in historical records. To put that into proper perspective, we need to remember that first, we find the records and THEN we can find the people. Most genealogical researchers (nearly all) come to me and say something like, "I am looking for my (great-great-great-grandfather or whoever)." What they should be saying (and doing) is that they are looking for documents, records, or other similar items that contain information about that ancestor. Remember this, I did't file or organize the information they are looking for. In fact, the documents may never have been properly "filed" or organized. The documents may have been figuratively scattered to the wind.
By the way, looking for a needle in a haystack is really simple compared to finding documents spread across the world, as long as I know there is a needle and I have the right haystack.
Genealogists focus on names, dates and places as primary identifiers. They also rely on a system of indexing that extracts certain information from a document and thus expands the number of search terms available. In doing this they further rely on the following:
- The accuracy of the original record
- The accuracy of the person reading the original record
- The accuracy of the person recording the information extracted from the original record
- The ability of the program to identify and distinguish the record from other similar records
an on and on and on.
In my law example, a record could be "lost" if it was simply filed in the wrong file folder. In our present time of computerized filing systems, a record can be lost in hundreds of different ways.
For all its sophistication, today's computerized search systems with their search engines still rely on someone categorizing a document in a specific way, by name, date, place, topic and so forth. We are all left to try and figure out how the documents we want are organized and what we have to do to find them. This process may be as simple as entering a name into Google or as complicated as searching page by page through an old, hardly readable, microfilm when we do not know if the record we are looking for is there at all. Computers and computer programs are wonders to behold, but they are no better than their basic filing systems.
We now have a movement among those who program computers to implement a user generated filing system involving place names. As I said at the beginning of this post, places are crucial to finding pertinent documents. Many people who have lived on the earth have names and dates so similar that telling them apart is extremely difficult. Adding a place name to the search very often solves the problem. But what happens if the place name is wrong, inaccurate, or out-of-date? Hmm. Now we have a real problem.
Here is an example of what I am talking about. Let's suppose we have an ancestor designated as follows:
John Jones, b. abt 1850, deceased, England.
Would you be able to find this person, just from that information? Like many genealogists, you probably would find a John Jones who was born about 1850 in England. In fact, if I were to do a search in Ancestry.com using this information I would get 15,997,760 possible results starting with entries in the U.S. Federal Census. Remember, my John Jones lived in England.
Most genealogists, rather than recognize that their search was hopeless, would assume that Ancestry.com's search engine was "broken." They would then try to be more specific. Here is the conundrum. You would need more information about the person before you could search more specifically, when more information about the person is exactly what you are looking for (i.e chicken and egg).
Now, when entering a person into a program such as Ancestry.com or FamilySearch.org, we are given suggested places. Eventually, genealogical researchers hear about a basic rule that place names need to be recorded as they were at the time the event occurred. But what if the search program does not recognize the place, even assuming that we know the place? There you go, that is exactly the problem with programs that suggest "standardized" place names.
Granted the programmers would like you to conform to their system of organization. They want the following about John Jones:
John Jones, b. 18 April 1850, d. 29 January 1910, Tandridge, Surrey, England, United Kingdom.
But what about the problem when the place name has changed? What if the place name has changed several times? The first issue is where are the records? The second issue is how were the records categorized in the first instance? Here is a question that hinges on this issue.
Where are the records located that were generated during the existence of the Arizona Territory?
Arizona Territory was in existence between 1863 and 1912 when Arizona became a state. As I work into this example, think Europe during its history and the boundaries and names of all the countries.
Before 1863 the land in the Arizona Territory was either part of Spain, Mexico or the New Mexico Territory. When a genealogist can answer a question such as this out of his or her head, they are called an expert. The real issue here is not the location of the records, but how those records, now scattered across the world, are characterized. If one repository files the records under Arizona Territory and then another simply puts them in a pile called "Arizona" and then a third classifies them as Southwestern History, how to you find the records?
I have written about this topic many times before. Computer programmers working on search engines for genealogical records should be complemented on their efforts to organize huge piles of records. But when they do so and limit their searches by imposing their own organization on the records, things start to fall apart. If I want to enter a place that is not in their list of "standard" places, then my entry is considered to be non-standard.
Let me go back to my search for John Jones in England. What if by pure chance, I happen to know exactly where John Jones lived? What if I search for John Jones in Tandridge? (By the way, I made up the name and dates; I am searching for a person who does not exist but the place does exist).
When I enter "Tandridge, Surrey, England" into Ancestry.com's search engine (search fields) I automatically get a suggested entry for Tandridge, Surrey, England. What do I get? One result for a person named Wm Jones in the Tandridge, Surrey, England. What about FamilySearch.org? Remember, this is a fictitious name. I get 11,637 results and most of these are John Joneses.
So I mark the place in FamilySearch.org as "exact" and redo the search, and I get no records.
I have gone from over 15 million records to no records just by being specific in the place where the event occurred. Doesn't this fact suggest something about place names? Here is another example. Let's suppose I search for my Grandfather, Leroy Parkinson Tanner. I can search in either FamilySearch.org or Ancestry.com and come up with records about him specifically by just searching on his name. Yes, that is all I need. His name is distinctive enough to be found in either program.
But what if all I knew were his last name and a place? What if I search for Tanner in St. Joseph, Apache, Arizona Territory, the place where he was actually born as it was known at the time. I get 19,470 Tanners, but none of them are my grandfather. Why? Let me try again. This time using another geographic location, St. Joseph, Navajo, Arizona.
Hmm. I get a lot of his brothers and sisters (there were 17 children in the family), but not my grandfather. Let me do another search for Tanner in Joseph City, Navajo, Arizona. I get the same list of Tanner relatives but no grandfather. So now I go and look to see how his birthplace is entered in the Family Tree program. He is, in fact, listed as born in St. Joseph, Apache, Arizona, United States. So now I do a search using this place. He finally shows up in the search. But the place name is not technically correct or standardized. It should be St. Joseph, Apache, Arizona Territory, United States.
Guess what, now the search engine cannot find him. So, with a non-standard and wrong place, he can be found, but with the correct and standardized place he cannot be found.
This exercise illustrates the basic challenge of all computerized search engines. Even if I conform to the parameters set by the programmers and use the "standardized place," that still does not guarantee that my person can be found. Why is this? Because in this case, most entries referring to birth in original records did not differentiate between "Arizona" and "Arizona Territory." The significance of this distinction was simply unknown and unused at the time.
What does this mean to those searching for their ancestors? You really do need to know the exact place some event in your ancestor's life occurred. But you also need to know how to search for that place. If you have the name, date or place too different from the way it was recorded in the original documents, your chances of finding that ancestor are dramatically decreased. For this reason, we tell people to work from the known to the unknown. What does this mean for "standardized" dates and places? It means that you cannot assume because your place name is "standardized" that the program's search engine will find the right person. Remember, I immediately found Leroy Parkinson Tanner by name, but could not find him with his last name, Tanner, even when I had the exact location standardized where he was born. It might help to know that St. Joseph, Apache, Arizona had less than 1000 residents and a whole bunch of them were Tanners.
The final conclusion is that searching is an art, not a science. It takes practice and persistence. Keep looking and keep thinking.