Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, February 12, 2014

User Searches vs. Automated Searches -- who wins?

I was told about a post by Anne Gillespie Mitchell on the Ancestry.com blog entitled, "RootsTech Presentations: Fine Tuning Ancestry.com Search and Extracting Stories from Military Records' and took a look at the article. In going through the PowerPoint presentation on the post, I saw a statement that interested me. It said, "Unless you "tune" your search, just one field needs to match the record to be in your results." I realized that I had here in print the explanation for why searches on Ancestry.com produce so many "false positives" along with hopefully, the correct records.

At the same time, I began to wonder how accurate my searches would be, even using the techniques set out by Ancestry.com, compared to a search managed by the automatic search capabilities of either Ancestry.com or MyHeritage.com. I also realized I had a way to make the comparison. I have compiled a list of sources for my Great-grandfather Henry Martin Tanner from all sorts of searches and have them on FamilySearch.org Family Tree. I won't reproduce the list, but all of the sources can be located online using one or another of the online database programs or through Google searches online.

I presently have 40 individual unique sources for Henry Martin Tanner (b. 1852, d. 1935). I will call that number the Tanner Base Number of Sources or TBNS for any comparison. It only seems fair to compare what particular search engine finds in terms of a standard search based on a number that likely exceeds what any one database might find. I might mention here that there are likely many more records available for Henry Martin Tanner and I can only guess what the total TBNS would eventually look like. OK, so we get this straight. We first search using the manual search capabilities of the different programs to see how they rank in terms of finding sources. For the purpose of this exercise, I am excluding a basic Google search, because the numbers come back so huge as to be meaningless. Google also includes every single mention of the name, such as in this blog post, even if it is not particularly related to genealogically significant data.

It occurs to me that this comparison will also vary dramatically simply be the number of records from Arizona possessed by the database provider. In other words, the particular provider may have really good New England records and be totally out of sources relating to the time and place when and where my ancestor lived. In the past, I have used a similar technique to compare online search engines. The correlation is rough and may vary considerably by geographic area and by the sophistication of the user.

Now to the meat and potatoes of the comparison. Remember there are, at least, 40 distinct records available for Henry Martin Tanner. Also remember that I am going to use a minimalist approach to searching; only adding three or so fields. For this first test, I am not going to use the helpful automated or semi-automated search functions of each program. Two more conditions, I will not vary the search terms between the various online programs and I will stop looking for more accurate hits after the list starts including people from all over. I am doing the searches with only a minimum of search terms. In each case the terms are:

  • Henry Martin Tanner
  • Arizona
  • Male

Let's start with FamilySearch.org and go on from there. Manual search from the following:

  • FamilySearch.org --  9 sources
  • Ancestry.com  --  11 sources including certificates and photos
  • Mocavo.com -- 700+ sources however filtered to remove websites, the total was 44
  • MyHeritage.com -- used the search from the profile for Henry Martin Tanner and got  21 more relatives. 

Now, I could have gotten more out of each of the online databases if I have varied the search terms and worked at getting information but the way I did it is more reasonable for my purpose with this post.

OK, here's what happened. I got to MyHeritage.com and the whole idea of searching for one individual at a time and hoping to find sources went out the window. The number of sources and leads to other individual's sources was almost overwhelming. From another standpoint, failing to use Mocavo.com now that this site has been adding so many records is ignoring a really valuable source.

I would have to say that the results of this text were mainly a demonstration of how different each of these databases really are. They are all valuable tools. I would really have a difficult time ranking them or giving up any one of them. You might notice that Mocavo.com is a new addition to my usual list, but it is certainly worth the effort. I have been talking about MyHeritage.com for some time and I am even more impressed with the latest refinements and additions to the search process. I now have 6501 Record Matches sitting there for me to review.

The conclusion of this however is easy to make. MyHeritage.com wins by a rather large margin as long as you let the program do the searching because of the huge number of extractions it did almost automatically for 21 other related people. That is also the case with Ancestry.com. Let the program find the records. By letting Ancestry.com do its searching for green leaves, I have 12 source entries for Henry Martin Tanner excluding photos and stories, with both photos and stories I have 17 sources.

The surprise entry is Mocavo.com. It has moved well into the range of the larger programs now and can no longer be ignored. When people ask if you should be using all four of these online programs, I would have to say, yes, if you are really serious about doing your genealogy, you need your family tree on all four of these programs.

Now, there is a qualification. Presently, these programs are heavily weighted in the United States. I expect that to change over time, but right now, if you have English or UK ancestors or Irish ancestors, I suggest looking at findmypast.com.  I am still surprised at the number of people doing English research that are not familiar with that huge database.

Last a reminder. Remember that I did not work the programs for this post. I can find things in all of the programs that the automatic searches do not find. This is particularly true with FamilySearch.org because so many of the records are waiting to be indexed.

No comments:

Post a Comment