RootsTech 2015

Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, March 13, 2014

Why don't Genealogical Search Engines Work?

One complaint that I hear almost constantly is why don't genealogical search engines work? I find this criticism aimed at all of the online database programs from time to time, some with more frequency than others. It is interesting that those people who use FamilySearch.org heavily are the ones who complain about that program and the ones who use other programs, such as Ancestry.com, complain about the program they happen to be using at the time. When FamilySearch.org revamped their website and came out with the "new" website, the first complaint I heard was that the users "couldn't find anything." Just this last week or so, I have seen numerous complaints concerning Ancestry.com's "new" search engine and how it is definitely inferior to the "old" and now abandoned version. Why is there this impression that the programs used by these online genealogical database programs are somehow broken or don't work?

Unfortunately, there is no simple answer to the question. But I will offer a few, by no means all of the reasons for the difficulties encountered in using these programs.

First, what is a search engine?
The definition of a search engine is simple. A search engine is a program for the retrieval of data from a database or network, especially the Internet. See a general Google search using these search terms "define search engine." A little more complete definition comes from webopedia.com, it is as follows:
Search engines are programs that search documents for specified keywords and returns a list of the documents where the keywords were found. A search engine is really a general class of programs, however, the term is often used to specifically describe systems like Google, Bing and Yahoo! Search that enable users to search for documents on the World Wide Web.
The definition is poorly written and the grammar isn't all that great, but that is pretty much what we are talking about when we complain about how a certain program does or does not find the stuff we are looking for. FamilySearch.org, MyHeritage.com, Ancestry.com, findmypast.com etc. are all online database programs. They store information for the purpose of making searches.

Searching online is a skill and some people can do it better than others
Using a complex online search engine program is a learned skill, just like riding a bike or playing the piano. Usually, those who practice regularly (i.e. use the search engines to search for stuff) can do searches better than those who spend little or no time searching. But, in addition, some people just have a natural talent for the skill. Just because there are professional concert pianists does not mean that I can't do an adequate job of playing the piano if I am willing to put in the time and practice (by the way, I do not play the piano well or otherwise) but it does mean that there will always be someone out there who can do the job better than I do.

Just like acquiring any skill, it helps to have a really extensive understanding of the subject. Following up on the piano example, those who learn to play the piano extraordinarily well also learn a lot of music theory. Likewise, when you are using a computer to do searches, online or otherwise, it helps to know quite a bit about programming and how search engines do or do not work. Unfortunately, most people jump right in and start searching on a computer without knowing diddle about how computers work or how search engines work. That is one reason why those of us with extensive technical backgrounds sometimes do a better job of searching than others without that kind of background. I say "sometimes" for a reason. Searching generally is one thing, searching for genealogical information is another thing altogether. Why is this? Because to search for genealogical information you also need to have specific genealogical skills. Hmm. That seems really important. Let's state it like this:

Using a search engine is a skill. Learning about genealogy also requires some skills. Searching for stuff about genealogy requires both of these skills and likely a few more.

So, one of the problems I have when people start complaining about this or that search engine is that I know that the person complaining lacks both basic sets of skills and many others as well. How do you tell someone that diplomatically?

Search engines are mostly written by engineers not genealogists
I already mentioned that successful searches for genealogical data requires several technical skills. Guess what? Designing a successful genealogical search engine also requires skills in computer programming as well as an extensive knowledge of genealogy. Unfortunately, (there seem to be quite a few unfortunatelies in this explanation) there are very few people and even fewer programmers, who understand both programming and genealogy. One stelar exception is the CEO of MyHeritage.com, Gilad Japhet. He is an excellent genealogist and and extraordinary programmer. That is one of the basic reasons why MyHeritage.com has such a successful search engine. The other companies have some very good genealogists and some very good programmers but they don't really talk to each other all that much.

It is easy to see that understanding what you are searching for will help you find it. Likewise, if you do not understand what you are searching for, you are very unlikely to find anything useful at all. The same thing goes for programmers. If they have no real understanding of genealogy, they don't know what needs to be done to make their programs work. The usual solution to this problem is to call in some genealogists and have them talk to the programmers. By and large, this does not work. Why? Because genealogists and programmers do not speak the same language. So why not call in someone who speaks both languages? That seems like a workable solution until you try and find someone who actually does speak both languages. Like the CEO of MyHeritage.com, they are few and far between.

Shouldn't a good search engine work for genealogy as well as anything else?
As I write this post, I can hear the engineers out there grumbling and saying that search engines are search engines and a well written search engine works for whatever and Tanner has no idea what he is talking about etc. Well, at this point, I could go into a lengthy discussion about algorithms and such, but then I would likely fall asleep with the rest of you non-programmers out there. This issue gets into the difference between catalogs and string searches (letter-by-letter or word-by-word searches). You might have noticed the definition at the beginning of this post included the phrase, "returns a list of documents where the keywords were found." As we commonly say, the devil is in the details. To some extent the success of some search engines over others is determined by the access they have to the data. But, even with complete access, how the search proceeds determines the results.

It is probably time to define an "algorithm." An algorithm is a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer. See Google again. The set of rules you adopt determines how well your program works. At this point, I could get into an analysis of several online search engines and reverse engineer their search engines and show why some work and some do not. For those of you who care, this is a rather elementary "black box" process where I look at what they program can or cannot do and deduce how the program works. In science and engineering, a black box is a device, system or object which can be viewed in terms of its input, output and transfer characteristics without any knowledge of its internal workings. See Wikipedia: Black Box.

Let's just say that trying to write a search engine for doing genealogical searches without knowing a lot about genealogy is like trying to build an electrical transmission system without understanding electricity. You can do it, but it probably won't work well or at all.

Summary
Genealogical search engines don't work because we don't know how to use them and by and large, they are written by engineers who don't understand genealogy. That pretty well sums it up. Can a bad search engine be improved upon? Well, yes. Of course. It takes time, money, more time, more money etc. There are genealogical search engines out there in online programs that are in the hopeless category. There are others, most of them by the way, that are in the OK for most uses category and there are really, really good ones, such as MyHeritage.com and Ancestry.com, that stand well above the others. Maybe there is a correlation between having a good search engine and having a successful online genealogical database program? Could be.

6 comments:

  1. Ok, I'll bite... I like and understand (mostly) your explanation about why search engines don't always seem to work. I acknowledge that I have complaints about Ancestry's new search, although I haven't done so in a public forum. However, I have experienced a problem with Ancestry's old search, and wonder if you can tell me if it's my fault or if it's perhaps that company's fault. Here's the problem: I use the "Drouin Collection" (French-Canadian records) to search for an ancestor in the province of Quebec. I know that he is born in a particular place, so put that place in the "Location" field. I don't put anything in any of the other fields, like name or year, because I want to see (quickly) my ancestor in the results list. However, the results come back with all sorts of other locations but the one I asked for. So, I then type the place in the "Keyword" field, and interestingly, the results come back for just that location. Why wouldn't the location show up in the field that Ancestry themselves created? You might not know the answer, James, but I thought I'd ask in case you could tell me if it's indeed me who doesn't know how to use that particular search engine, or if it's an Ancestry glitch. (By the way, I haven't yet tried to replicate this situation in the new search at Ancestry.)

    ReplyDelete
    Replies
    1. This is a very common complaint and one of the reasons why people feel that the search engine does not work. In fact, it is doing exactly what it was instructed to do, default to the next available jurisdiction.

      Delete
  2. Ancestry has a good search engine? When did that happen? Old search or new, I consistently find that it is the least user friendly search engine and it comes up with the worst results compared to MyHeritage, Mocavo, or FindMyPast.

    Ancestry didn't get as far as they did as a database program because of their great search engine, they got as far as they did because they have sheer volume over their competitors. That's it. FindMyPast, on the other hand, seems to be growing based off their much more intelligent search engine that has many more options.

    Heritage Quest offers something for their search engine that I wish all of the sites had, which is advanced sorting of the results. If I want to see all results of a man named John sorted by age, I can do that. I can sort by his name, by his birth place, residence, etc. Why this isn't an option anywhere else I don't know. Even Google as a genealogical search engine allows us some sorting ability and filtering level that Ancestry does not.

    If we had more ability to filter and sort our results at Ancestry, it might put the search engine Ancestry uses into the good category. Until then, it mostly distributes garbage unless you go into one database at a time, and that makes people pigeonhole their view into a specific set of records (something that I think is a bad thing).

    ReplyDelete
    Replies
    1. interesting comments. There are a lot of highly opinionated thoughts on all sides of the issue.

      Delete
  3. Thanks for the well written and thought provoking article. Once we build skill at using a particular search engine, we are frustrated when a company changes it and forces us to abandon that knowledge and relearn (a perceived slow-down in accomplishing our task). I think the company isn't focused on improvements for the user as much as improved access to the collection it hosts. What are your thoughts about user-controlled search filters? It gives us perceived control of results, but feels more like the automated phone service where we continually press 1 for English, press 3 for another option,...then another and another, losing our focus which was an answer to our question. To what extent are use-accessed filters an aid or a distraction? Kayt

    ReplyDelete
    Replies
    1. It sounds like another blog post to me. Thanks.

      Delete