Tuesday, September 29, 2015

Gathering Fruit From Family Trees

One of the more controversial issues in the very non-controversial area of genealogical research is the validity of the information in online family trees. There are millions of these individually accumulated trees online and the entries vary from well documented to pure invention. I am reminded of this issue every time I open and see that I have over 100,000 Smart Matches. Here is a screenshot showing that number:

You might also note that I have 14,425 Record Matches, but that is another issue. What am I supposed to do with over 100,000 suggest relatives? That is almost the entire population of Provo, Utah, where I now live. This large number comes from the fact that I uploaded my entire file into the website many years ago and I have been actively working with the file since then. The large number is really an indication of how well the program works.

The obvious answer is to focus only on those particular family members you are currently interested in researching and ignore the rest. But this begs the issue. The real issue underlying all of these potential connections is that lurking out there might be the solution to some of my end-of-line issues. What is lacking is time to work on everything at once. It is all too easy to dismiss lowly "family trees" as beneath the consideration of a true research genealogist. Don't we have much better things to do with out time than mingling with the unwashed masses?

Whenever I have written about the issue of the proliferation of online family trees, I have always had comments about how important it was to use the unsupported information in those family trees as the basis for research. That concept works (although I think it is mostly a waste of time) as long as there are only a few such research suggestions. But now, with over 100,000 such possibilities out there from just one program, there needs to be a workable strategy that does not involve just sticking my head in the sand and ignoring the issue altogether.

In theory, collaboration is a positive idea. It is a way to avoid duplication of effort and crowdsource efforts to clean up the data, but in face of this reality, there is a point when the crowd gets too large to manage. In the above MyHeritage situation, I could literally spend all my time just confirming Smart Matches and communicating with the huge mass of connecting family trees. How do I sort out those who have just copied their entries and those who have something legitimate to offer?

At the other end of this spectrum of family trees is the Family Tree and other such constructs such as, and where there is an attempt to avoid unnecessary duplication. In, for example, if I sort by the number of matches for individuals in my family tree, I have a lot of ancestors with close to 200 matches each in other family trees in the system. Most of these matched individuals are 10 or 11 generations back in my own family tree and lived in the 1600s. Even if I wanted to investigate one of these ancestors, it is very likely that serious research needs to be done in that particular family line before getting back to this particular ancestor. It would be nice if there were some way to determine those connections that had information I did not have and those that were merely copies.

My present strategy is to concentrate on one ancestral generation at a time. I am now working on my 6th generation and adding all the sources and correcting the entries from the sources. Obviously, this is a never ending task, but since I am making the changes and adding the sources to the Family Tree, I am making some progress. I am also adding sources from, and, as well as many other programs, microfilm records and my own pile of paper records.

