Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, April 16, 2026

The Main Challenges of FamilySearch Full-text Search, Part Three

 

https://www.familysearch.org/en/search/full-text/

These instructions about using simple Boolean Algebraic symbols for doing full-text search are a reminder that when you are using AI you are talking to a computer, not a person. If I started a set of Rules for AI in Genealogy, I would put this statement as the first rule. I guess that this first rule would be about the need to remember that AI is a tool not a solution. 

There is a lot of discussion out there on the internet about "outsourcing your intelligence to AI." But it is apparent to me, if not to too many other people, that these arguments and concerns especially from the educational community, were being made using the exactly the same words about electronic calculators and the Wikipedia website. I know I have written about teachers banning electronic calculators and copying articles from Wikipedia, but I feel like I am having déjà vu all over again. 

Let's get serious about the status of the document collections on FamilySearch.org. Presently, there are six different document collections or ways to search the documents and they overlap only slightly. The different avenues of access are the following:
  • The main catalog
  • The historical record collections
  • The Images collection
  • Full-text Search
  • Simple Search
  • The Books collection
There is no real way to determine the degree of redundancy between these separate collections. 

What does Full-text search add? Ultimately, if all six are somehow consolidated, we may be able to find a way to search ALL the documents on the website from one search interface. Presently, the only way to have any confidence at all in the extent of your searches into the FamilySearch.org website is to do the search six different ways. Particularly, with the Full-text Search and Simple Search functions, there is no way to know what part of the website's collections you have searched. From my own experience, I am positive that most users of the website do an unsuccessful name search and conclude that FamilySearch does not have the documents they are looking for. Here are some thoughts about how Full-text Search fits into the equation.

The bridge between our "old school" research standards and these powerful new AI-driven tools is precision. We have to stop treating the search box like an easy solution and start treating it like a surgical tool. FamilySearch’s Full-text tool is essentially a massive search engine for historical documents—deeds, wills, and probate records that were previously "locked" inside digital images. To maintain our research integrity, we must master the technical nuances of how to talk to this specific system. We must also be painfully aware that any search we make only reaches an unknown number of documents that have been processed and made available to the search. 

The first step is understanding that the computer is literal. If you search for Sarah Miller in the Full-text box without constraints, the computer will find every "Sarah" and every "Miller" in a land deed or a court record. By using an Exact Phrase search—"Sarah Miller" in quotes—you are imposing a research standard on the machine. You are telling it that the relationship between those two words is non-negotiable. This is the only way to effectively search for a specific name in a sea of handwritten text that has been converted by AI.

But historical records are rarely that clean. In a probate record, your ancestor might be listed as "Sarah, the daughter of John Miller." This is where I find the Proximity operator to be a helpful tool for the modern genealogist. By searching for "Sarah Miller"~10, you are telling the FamilySearch engine that these two words must appear within ten words of each other. But the results of my use of all these Boolean tools are mixed I am not sure that FamilySearch's full-text search understands them because the results are inconsistent. 

Using tips for finding names, for example in legal documents where titles or middle names often separate a first and last name. If they are applied consistently, they could bridge the gap between the rigid search and the messy reality of 19th-century legal phrasing.

We also have to contend with the "wildcard" nature of history. Spelling was often more of an art than a science in the past, and even the best AI transcription can misinterpret a letter. Using a Single Character Wildcard, such as Sm?th, allows you to find both Smith and Smyth. In the Full-text search environment, where an "o" might look like an "a" to a computer, these wildcards are essential for ensuring that no record is left behind simply because of a digital misread. Using this tool, may also prove frustrating when the returns start to number in the millions. 

Finally, we must learn to curate our results by using Exclusion. If you are searching for a surname that is also a common place name or term, your search can quickly become cluttered. For example, if you are looking for a family named "Rice" in a county known for its agriculture, searching for Rice -paddy or Rice -planting allows you to strip away the irrelevant data and focus on the human beings. 

The examples I am giving here illustrate that any AI implementation is only as useful as the ability of the user to maintain control. Meanwhile, give the state of affairs with the collections on the FamilySearch website, relying on Full-text Search for all of your research is lamentably impossible. 

No comments:

Post a Comment