Pages

Sunday, January 5, 2014

Stepping off into the Merging and Searching Morass on FamilySearch

I received the following from a concerned reader about recent experience with FamilySearch.org:
The program doesn't really merge does it? It only allows some things to be attached and then deletes the rest. It isn't a very good program. In my opinion the real problem is a very, very weak "duplication" checking system. Over and over I have looked for ancestors only to be told they are not in the system and must be added. Once I add them and then complete the process by adding death dates or 'deceased', the program mysteriously finds "the exact person with the exact spelling and spouse and everything I put in my 'find' search. Now I have a duplicate of the very same person, and considerable time involved, and must decide which one must be deleted. (not really merged) Very, very poor ability to find matching people within FamilySearch's own database. Likewise, the source search function often can't find records that I know are in the collections. It is hard to get excited about a program as weak as this currently is. Additionally, your friend is not the only one to receive answers back from their submitted 'feedback' that shows an inability or lack of concern to solve the problems. Why would anyone want to send feedback when the responses are sooo poor. As a volunteer missionary for FamilySearch I am finding it a bit distressing when patron after patron declines to send in feedback or call FS because they have had such poor help and responses. One after another informs me that it doesn't do any good. I would like to be able to tell them it does matter but I am finding that hard to prove. I have been of the opinion that they can't fix the problem if they don't know about it, but I'm beginning to think they can't fix the problem even if they do know about it.  
My mind's eye sees 5 or 6 teens wearing skinny jeans, red Converse Chuck Taylor's, and skateboard t-shirts, in a run down strip mall store with "FamilySearch Tech Team" stenciled on the glass door. I hope this isn't the case, but it feels like it.
OK, I realize that this is a huge amount of detail to tackle in one blog post, but I will do my best. Right off, I can assure my reader and anyone else who is concerned that the FamilySearch teams working on these issues are highly capable, very conservative looking folks and do not fit the stereotype depicted in the comment at all. They have a very nice and well-equipped office in downtown Salt Lake City, Utah.

Now to the meat of the comment. I would like to parse the comment into subjects. The comment is really talking about some very different and somewhat confused issues, not all of which relate directly to FamilySearch.org at all. I will discuss the issues in the order they are raised in the comment.

The first issue raised concerns merging. I would observe that the vast majority of the merge issues I have seen so far, arise from the limitations placed on the program by sharing a common database with New.FamilySearch.org. These problems seem to be intractable until the two databases are entirely separated through the demise of New.FamilySearch.org. The issue of the survival of certain information ostensibly attached to the merging individuals is a serious question that has just now begun to be discussed. As mentioned in a previous post, apparently photographs, stories and even documents attached to an individual who is merged into another individual may be lost at the time of the merger. This is a serious problem. Especially if the solution of the problem requires reattaching all of the documents and photos. This is particularly disconcerting to me because I have some ancestors who fall into the category of Individuals Of Unusual Size to whom many documents have already been attached and who may be subject to a merger in the future. It would be very frustrating to have all of the images disappear.

What the reader refers to as a "weak duplication checking system" is really two different things. This issue is much more complicated because it involves searching for a person in a database of more than a billion people.  What is more, it is an absolute given that a huge number of the entries for the individuals in the database are duplicates. However, this is not per se a FamilySearch.org issue it is rather a problem with all database search engines. The problem involves an apparent contradiction. Simply stated, the contradiction is that in order to identify an existing entry in a database you need sufficient information about the item being searched for to find it. In other words, you need to know exactly how the existing individual is identified in the database. So, you would think that in order to find an entry in a database you would supply as much information as possible to initiate the search. Counterintuitively, the opposite is true. You need to search with as little information as possible in order to avoid having the search engine ignore a match simply because some of the fields you specified in your search do not match those fields in the objective. What happens to create the situation outlined by the reader is that the original search does not find the match either due to a lack of specific information or too much information. Once the entry is completed, the program has more information upon which to base an evaluation including the relationship of the newly entered individual to all of the other individuals in the database. So it is not surprising that once the entry is complete, the search engine finds a duplicate.

Of course, to all appearances, the program has not functioned properly. In effect, there are two different things in the initial statements made by the reader. The first is the inclusion or exclusion of information in the merge process. The second is the limitation of the search engine to find duplicates.

The reader observes the failure to be a "very, very poor ability to find matching people within FamilySearch's own database." This is not the case. In fact, in most instances, the database is actually doing too good of a job. It is trying to match too many of the fields. This is apparent when you are searching for FamilySearch.orcs Historical Record Collections U.S. Census records. In that case, if you specify a birth date and place you may not be able to find the record. But if you omit both the birthplace and the birth date and instead specify a residence, you will find the record immediately. Similar situations occur in every database search engine I have ever used. Some are better. Some are worse. The same things are happening when the merge function goes to look for a duplicate, except in those instance when the duplicate is subject to the limitations imposed by having New.FamilySearch.org still attached to Family Tree and using the same database.

The next problem is the apparent inability of the search engine to find records that are known to be in the Historical Record Collections. I have written about this problem previously. This is also not a very straightforward issue. As I just explained, the failure to find a known existing record stems from the inclusion of too much information. In this regard, the reader says "It is hard to get excited about a program as weak as this currently is." I would have to strongly disagree. If I had the time and opportunity to do so I could sit down with the reader or anyone else and likely find a huge percentage of their failed searches. What I am saying here is that experience in searching is at least as important as the ability of the search engine to function. It is absolutely true that the ability of the FamilySearch.org search engines collectively could be improved. But that can be said about any search engine on any website on the Internet.

In many cases, the game is not "I tell you what I want and you find it." It is instead, "I try to guess what you call it so you can find it." There is a substantial difference between the two approaches.

What is more serious is the reaction of the reader and others that the FamilySearch.org search engine is somehow "broken." It is not broken. But many of the search techniques that are needed to make it work in a predictable fashion are not intuitive.

The next round of comments made by the reader address the "feedback" issue. It is unrealistic to expect that volunteer support personnel from FamilySearch.org would even be aware of the problems outlined much less have any constructive suggestions for remedying the problems. However, it is true that FamilySearch.org often fails to admit or publicize known issues. Once again, FamilySearch.org is not unique or even particularly noteworthy in this failure. For example, my own failure to find a solution to an ongoing problem with Google.

Failure to provide feedback is a serious problem. Most programming issues are resolved only after complaints reach the point where the engineers believe that it is a programming issue rather than a user originating issue. So the number of feedbacks received is important.

The last point made by the reader is entirely valid. Sometimes, as was the case with New.FamilySearch.org the problem can not be fixed.

The reader raises some very complicated issues. From my experience, I believe that FamilySearch.org is reasonably aware of all of the issues raised and is working towards a solution where solutions are possible. It is unlikely that FamilySearch.org will come up with a solution to the basic limitations of database search engines as such. Although that is possible.

Are there problems with the FamilySearch.org search engines? Yes. Do they sometimes work in an unpredictable fashion? Yes. But so does every other search engine in use on the Internet. Is Feedback important? Extraordinarily important.




13 comments:

  1. I find that the limitations of ALL online databases far outweigh the positives. FamilySearch is no different in this regard than Ancestry, Rootsweb and even MyHeritage (which is marginally better). The only way to get the data YOU want, displayed the way YOU want it displayed, including documents, photos and so on, is to make your own website. Otherwise, learn to live with the limitations inherent in all online databases.

    ReplyDelete
  2. I find it more effective finding matches and duplicates on Family Tree using my Roots Magic database program. That way I can see what's there from another view, before adding someone new to the Family Tree database.

    I just started wondering on Friday the possibilities of losing photos with merging, so this post was timely! Thanks.

    ReplyDelete
  3. Related to the merge and new.familysearch, the issues must not all be resolved as there are still people who can't be merged because they have so many records. I had thought that would be fixed when new.familysearch became read-only but not the case.

    Another thing I wondered about is what I call orphan records. Records where after a merge are standing alone with nobody attached and no information except the name. Do these somehow get deleted?

    Great blog, one of my very favorites. Thanks.

    ReplyDelete
    Replies
    1. I have never heard what happens to orphan record but my guess is that they star around forever. Thanks for the comment.

      Delete
    2. I think they do too. I just deleted some spouses with a name of UNKNOWN with only the marriage date (no children, no other information). The correct spouse and marriage date along with the children were attached to this husband. It would seem not a too difficult task for famiilysearch developers to search for these orphan people and delete them, or at least produce a list and have someone manually go through, double check and delete.

      I would usually do a merge, but with no information and no name that didn't seem the way to go. I have tried to be diligent when doing merges to check all records that will remain after the merge and take care of them as well.

      Delete
  4. I would like to offer my opinion on this post and your previous post concerning "What genealogy records are and are not online". In both cases, I feel strongly from a genealogy research specialist standpoint, that the issue is simply created by not following current professional genealogy research standards, from the beginning. All construction of record sets must derive from evidential source documentation, period. If this had been the case from the beginning, the search engine data results would display out of the creating source and present factual connective, certain relationships, by adding proper correlation with all related records. Neither beginners or experts would become confused. Then there would arise a serious universal effort by all genealogists and historians everywhere, to obtain and promote the preservation, restoration and dissemination of all records in the world that are related to the collective and personal history of mankind. Theoretical epistemology cannot replace solid evidence found in basic sources. Two interesting examples in history are: (1) Brigham Young, his fellow apostles had rebaptism in the Salt Lake Valley on 06 August 1847; (2) the obvious rededication, though not explicitly stated, rebaptism of many individuals who were faithful saints, at the coming of Jesus Christ in America [The Nephites who “received the prophets” were spared from the great destructions ]. Isaiah 13:12 (KJV) "I will make a man more precious than fine gold; even a man than the golden wedge of Ophir." Men, women and children are valued by the records we keep.

    ReplyDelete
    Replies
    1. I can't' tell if I agree with you because I am not sure there are any current professional genealogy research standards that a known or accepted by the greater genealogical community. Thanks for the comment.

      Delete
    2. In an ideal world, Thomas, where everyone had access to the same sources, and everyone used the same research standard (as James says), and everyone's interpretation was the same, then what you say might be true. However, real-life is never so black-and-white. Somewhere back in http://parallax-viewpoint.blogspot.com/2013/09/collaboration-with-tears.html, I highlighted a case I was working on where access to private, and potentially sensitive, sources led to a conclusion very different from one that would be reached via "reliable sources" in the public domain. That situation cannot be handled using the collaboration model currently used by FS.

      Delete
  5. I've been reading with interest your columns, James, especially about the merge process. 99% of the merges that I have made have been incredibly obvious - same person, with exact same information or with a tiny typo (i.e. Georgeanna instead of Georgianna). So I haven't really thought as much of it until I was updating my tree recently and discovered a discrepancy with one of my ancestors that I'm blown away by. https://familysearch.org/tree/#view=ancestor&person=LCZW-XX4 is the person I'm talking about, if you care to look at it.

    She's got two husbands, Robert Edward Perry and Robert E. Perry. The difference? Robert E Perry has parents attached to him. I've been working on this family for nearly two decades and not one person has been able to find the parents of Robert Edward Perry so far (I'm sure they are out there, but none of us have exhausted our resources yet). So I went to go look at Robert E Perry and the parents information. There is NO source. Just a name, with no contact information and no way of figuring out what to do.

    So now I'm stuck. Do I merge them even with this paternal relationship I don't recognize? Do I leave the duplicates as is, knowing there are people attached to one person and not the other? FamilySearch doesn't exactly give out a lot of information in this situation to help one solve it, and I feel as if I am leaving a hole in the database by allowing this situation to continue on existing as in their system.

    The search issues I can get around. The situation I described above, however, has no good solution to it.

    ReplyDelete
    Replies
    1. Welcome to the world of made-up ancestors. At one time, I had 17+ generations going back from an end of line ancestor. Now there are only two or three. I suggest editing out the wrong relationships before merging. Then doing the merge.

      Delete
    2. Ugh, what a way to deal with this. This is my first experiment into using a tree outside my family's website and our Ancestry collective account. I'm rather disgusted by the quality of the data in the site - many, many spelling errors and duplicates that should be easily weeded out, and oddball mistakes in merging people who don't even resemble one another (i.e. a person has a child at age -6 in a different country? how does that work?).

      Sigh.

      Delete