Some people eat, sleep and chew gum, I do genealogy and write...

Saturday, November 2, 2013

A solution to unresponsive searches in Ancestry.com and other databases

I got the following comment on a recent blog post about searches in various online genealogical database programs and I thought it raised an very interesting issue:
Hi James, I have been trying to use newspapers through Ancestry.com and their results are so far afield that it seems pretty useless. I put in a search tonight for someone Steuben County, New York in 1826 and I get results from the 1930s and 40s in western US. I have had the same results in previous searches. I don't know how else to phrase a request then to specify name, date and location. Any pointers?
This seems to be a common complaint with most searches online, but, of course, since we are working on genealogy, it turns up frequently in searches in the big online database programs such as Ancestry.com, FamilySearch.org, MyHeritage.com and findmypast.com. You may not have noticed it in the same way, but this is likely a contributing factor to why you get millions of responses to a search on Google also.

Just in case you haven't noticed how this works and why it works, I will do some test searches in Ancestry.com. I am searching by filling in the following information in the general search screen, that is the one on the startup page. The only thing I am doing differently is taking advantage of the fields in an advanced search. Here is the information I started searching with:
  • Fist name - Eliza 
  • Last name - Tanner
  • Any Event Location - Arizona
  • Spouse's Name - Henry
  • Gender - Female
This should be enough information to find my ancestor, but without a little more information, I may get a lot of results from this search. Let's see what happens.



Hmm. In this case I got 875,717 results but the first set of results are all in Arizona. But wait, if I scroll down a ways, I start getting stuff like this:


If you click on the image and look at the results, you will see returns in Virginia, West Virginia, New Jersey, Illinois and so forth. Now, if the program had not found my own Eliza in Arizona, it would have automatically skipped on to all these other locations. The reason for this is the way almost all search engines in database programs function.

Essentially, if a database search does not find a result in a certain search category, it defaults to some higher parameter. In this case, I specified "Arizona," but the program didn't find any more entries matching the name and other information in Arizona, so it extended the search to the next higher geographic category. In this cast, that next higher category (or more expansive or whatever) is the "United States." So, the search produces "Eliza Tanner" in any state of the United States.

The problems is that to most users, this result appears like the program is broken and doesn't follow directions very well. Actually, it is following directions too well and giving you more than you asked for.

This issue is most obvious when the program cannot find a matching name in the target jurisdiction. For example, if Ancestry.com had not found an "Eliza Tanner" in Arizona, it might have gone to different states or even began returning "Eliza" with other last names. That is essentially why you get so many results even though only a very, very few of them are relevant.

To try to find out whether or not the information you are seeking is there anyway, you can try different combinations of search terms, leaving out and adding terms until you are satisfied that you have exhausted the possibilities. Sometimes adding more terms merely makes the problem worse. Sometimes taking out terms solves the problem but the opposite is also possible. For example, if I wanted to search further, I might take out the surname, supposing that the name had been misspelled in the indexes or sources in the database. Generally, after spending a while moving around search terms and combinations of search terms, I either find what I am looking for or start over in another database or work on the problem from another aspect.

You would have to sit there and watch me do this to see the dozens upon dozens of different combinations of search terms I use to get through to the data. I do the searches over and over and over with the different combinations. Eventually, I begin to understand exactly what the program is doing with my search terms and I can either stop or I find what I am looking for. In the case where the name or whatever is wrongly entered in the database, such as "Tamer" for "Tanner," I may never find the entry depending on the sophistication of the database search engine. I may end up searching the database name by name to see if the information is really there and I just cannot find it.

I might point out that most people fall off their chairs with exhaustion before I have finished this process. If I am determined to find something and you are watching, you will likely go to sleep before I finish. Two hours is common, longer happens frequently. You have to keep assuming that what you are looking for is there and the problem is the search terms or the combination. Try misspelling the name. Try looking in other counties or states. Try wider searches. I found a name last night that was spelled differently by going through every entry for my surname search in the entire database.

Good luck searching, a nice thing to say but luck has really nothing to do with it.

8 comments:

  1. Hi James, I was intrigued by this post and tried doing the same search you did, only specifying "restrict to exact" for Arizona, U.S.A. This reduces the number of records to 41,103. Do you find it helpful to play with the default settings in the search screen, before resorting to wildcards?

    ReplyDelete
    Replies
    1. First, I very, very seldom use wildcards. I also seldom use the "exact" filter either. This is because I seldom need to. I almost always find what I am looking for immediately. The long search sequence only happens infrequently. Both options, wildcards and exact searches, are way down on my list of options to try, but eventually I do get to them.

      Delete
  2. While you clearly have your methodology for navigating your way round a search that means you don't use the "exact" option, I think that the vast majority of searchers would be better served by using the "exact" option for two reasons:

    1. It reduces the list to manageable numbers immediately;
    2. It's what people *believe* that Ancestry (and the others) are doing.

    The instructions for Ancestry are simple:
    - click "Show Advanced";
    - click "Match all terms exactly" so it gets an "x" in the box;

    To take your Eliza Tanner example, if I repeat your selections but with "Match all terms exactly", then I get just 12 results:
    - 4 from different family trees;
    - 1 each from the 1880, 1900, 1910 censuses;
    - 2 each from the 1920 & 1930 censuses;
    - 1 from a US City Directory.

    Now, if I didn't find what I wanted, then I'd start looking at loosening some of those criteria. For instance, what if Henry didn't have his full name? So I'd remove the "exact" checkbox from his name first. I now get 90 results, primarily because it's now giving me Elizas with no spouse, I guess.

    We might argue about whether starting tight and working looser is the best way of doing it (and I seem to remember advocating the opposite with geography!) but for me the essential point is that so many people panic when they see 87,000 results that it makes more sense to start with the manageable number first.

    Adrian

    PS - I can't help wondering, when looking at "Heart Throbs of the West: Volume 7" (picked up in "Stories, Memories & Histories"), just what the recitation was that is referred to in 'Henry M. Tanner's favorite recitation was "Soap Your Coat Tail." His wife, Eliza, was exceptionally good at reciting "Betty and the Bar." '! Google does not help, for once!

    ReplyDelete
  3. Apparently, you and I have a different level of confidence in the indexing functions of the major programs. My experience is that starting out with the exact function is just one more level I don't need to use. I am certainly not advocating changing anyone's search procedures. But if you don't understand what is going on with a search, you need to know all of these methods and alternatives for finding what may be in the database. It goes far beyond making one cursory search and letting it go at that.

    ReplyDelete
  4. "Apparently, you and I have a different level of confidence in the indexing functions of the major programs."
    I doubt that - between the idiosyncratic view our ancestors had of their own history, dreadful handwriting (or microfilms), harassed indexers and peculiarly specified software, it's a wonder we find anything.

    You are absolutely right to say "It goes far beyond making one cursory search and letting it go at that." It's just that my feeling is that too many people see 875,000 answers and see that Ancestry has "ignored" Arizona, so deduce Ancestry must be rubbish. While I appreciate you are trying to educate them (for which you have my admiration) I still feel it better that they start with a manageable number, by specifying the exact search option.

    However - I will admit that when using the exact search option, you must know what you are looking for so that you can recognise it when you see it and gradually loosen the criteria when you can't see it.

    Adrian

    ReplyDelete
    Replies
    1. The reason I don't apply the "exact" filter at first is because doing so assumes I know how Ancestry.com or whatever has indexed the content of the document I am looking for correctly. Perhaps I am more of a pessimist in thinking that I have guessed wrongly. I will certainly use it in some searches where it is indicated. Thanks for the interesting comments.

      Delete
  5. "doing so assumes I know how Ancestry.com or whatever has indexed the content of the document I am looking for correctly"
    A good point that. Working entirely from (fallible) memory I think one of the Arizona state censuses for Eliza Tanner indexed her name but not her husband's name (by which I mean Henry was indexed but at no point was it indexed what their spouse's name was). Not what I would have expected - but I've not found the opportunity to use US State censuses)

    So if you know the indexing, it can pay to use the exact search option - if you don't know what indexes are (consistently) used, then a more generic search is better.

    ReplyDelete
    Replies
    1. The trick, of course, is trying to find out how the database was indexed. I think this is something that comes with practice. Thanks for all your very good comments.

      Delete