Some people eat, sleep and chew gum, I do genealogy and write...

Monday, April 13, 2026

The Main Challenges of FamilySearch Full-text Search, Part Two

 
https://www.familysearch.org/en/search/full-text/

As I continue with this series of posts, I am going to focus on FamilySearch's implementation of full-text searching. FamilySearch is free, and all you need to start searching is to register with the program. The search fields are surprisingly sparse, but they turn out to be adequate for most searches. The first challenge is to understand what information you need to put in the different search fields. With blank search fields, the invitation seems to be to fill in the information you're looking for in each of the fields. But the rule here is less is more. 

In part one of this series, I gave an illustration of the variety of forms of the name John in English and other languages. You need to begin thinking about how the information you're seeking may have been recorded in a variety of different documents. If you enter a very common search term such as the name "John", you may get millions of responses. FamilySearch's full-text search fields are hierarchical. The key words can be just about anything, but experience indicates that key words need to be able to identify various types of documents that you're searching for. For example, you might enter the keyword "deed". This tells the full text search to look for documents in that category. As you add information to the other search fields, you always want to keep in mind how the entries might be represented in the documents you're searching. 

Here is an example of a basic search. 

The quotation marks tell the search to look for the entire name rather than individually searching for all the Henrys, all the M's, and all the Tanners. You also need to remember that you are searching only the documents that have been processed by FamilySearch and made available to the full-text search. Over time, of course, the number of documents processed is changing at many millions per day, and so full-text searches will become more and more valuable. In this case, the name "Henry M. Tanner" is not common, and therefore the number of responses turned out to be very small. 


Despite this forced focus, the responses do include people who are not my particular Henry M. Tanner. The next step you could add a physical location for the records by editing the search fields. You can edit the search directly from the responses. There is a very difficult-to-see link called "Edit Search". When you edit the search, you get a surprise. The searches you found previously may or may not be found in a subsequent search. By adding a place name, you have eliminated any documents with the original name of the person that do not have both the place name and the person's name. 


 Here are the results.


Even if you go back to the original search to try and use the list that showed up initially, you may be surprised to find that another search turns up a different set of documents. You should go through each set of documents that are produced, unless, of course, you get millions of documents. Make sure to save the documents that you intend to use. Click on the image for the documents that you want to preserve, and you will see the transcription of the document in the language of the document. If the document is in Spanish, the transcription will be in Spanish. You can then use the icon tools to attach the document to a person in the FamilySearch family tree, edit the document, or download the document. There are also some tools for adjusting the image. 

Bear in mind that unless you use the quotation marks, you will be looking for every one of the words in the name or other information you enter. Place names are standardized, and so you may have to make some alterations if the place you are looking for does not happen to be already standardized by FamilySearch. 

Because you happen to find some documents does not mean that the full-text search has found all the documents in all the collections on the FamilySearch website. For example, if I change the place name to use simply Arizona, then I get an entirely different set of documents. 



In another example, if I use the name Tanner rather than the full name of my great-grandfather, then I get a different set of documents.


In some cases, the number of variations you would need to try can be overwhelming. 

 Stay tuned. There's probably going to be a part three to this series.

Thursday, April 9, 2026

Falling for the Old Technology News Issue, a commentary

 

This image represents the challenge of relying on outdated technology news in a rapidly evolving world. While the subject focuses on obsolete information like 56k modems and floppy disks, the environment outside is already moving forward with 5G networks, AI, and advanced electric vehicles. It highlights the potential for confusion and misinformation when using historical "innovations" to interpret the current tech landscape.

The current obsession with AI hallucinations exemplifies how obsolete perspectives can overshadow innovation. Many treat the risk of fabrications as a dispositive reason to reject AI, failing to recognize that while hallucinations remain a factor, they should not preclude a serious investigation into the technology's broader utility.

One simple response to this concern is asking if the person making the comment is familiar with the workflow model developed by Google using Gemini, Gems, and NotebookLM. In addition, there are other workflow models that use redundancy to diminish the possibility of fabrication or hallucination. The chatbot is simply asked to criticize the accuracy of its statements. Another alternative is to instruct the chatbot to stop when it cannot substantiate the accuracy of its statements. There has also been a significant effort on the part of the AI developers to curtail hallucinations. 

Because of my age, I also encounter people who fear technology in any form. Most of their fear stems from the constant reports on the media about how AI is going to destroy the world in some form or another. 

It took more than 30 years for cars to replace horses. Generative AI (GenAI) is being adopted faster than any previous technology, including the internet and personal computers, with nearly 40% of U.S. adults aged 18–64 using it by August 2024. ChatGPT reached 100 million users in just 0.2 years (around two months), whereas the internet took years to reach similar milestones. See The Rapid Adoption of Generative AI

Using Google"s NotebookLM, I can search 600 pages of genealogically valuable information in a matter of minutes and most of the time I spend reading and analyzing the responses. FamilySearch.org's Full Text Search is able to search millions of pages of documents in a few seconds and find an individual name or other similar information. This frees me from sitting in front of a microfilm reader for 8 hours to find one name in one document. 

I guess, as a genealogist, that I am caught between dealing with people still mired in a paper-based technology while working hard to keep up with the advancements in AI that change our entire way of doing research. 

The Main Challenges of Full-text Search Part One

 

Three of the major online family tree/data base websites have implemented AI based full-text search and to some degree, handwriting recognition in the last three or so years. FamilySearch.org's offering is called "Full Text Search" and includes handwriting recognition. The Full Text Search is available for free to all users. MyHeritage.com introduced a similar program called Scribe AI. Ancestry.com's contribution is confined to OCR and lacks handwriting recognition. All the efforts of the genealogy programs are behind the ability of Google Gemini with NotebookLM and some of the other AI websites. Of course none of the genealogy programs have the resources of Google or OpenAI and the others. 

As far as the genealogical community is concerned, handwriting recognition, document translation, and full-text search are revolutionary in changing the way we do research. I can put hundreds of documents into NotebookLM and then have a discussion with Gemini about the contents of all the documents at onece and the conversation quicky evolves into an examination of what records need to be found to resolve serious historical issues. But working with full-text search opens a whole new series of challenges. 

The first and major challenge is the issue of search terms. In the old way of searching using Google, for example, you can formulate search terms based on categories. For example, if I want to find some probate records, I search for probate records in the state or country where my ancestors lived. I didn't expect Google to find the actual records of my ancestors. That was left to me to search through the records individually for the people I was interested in finding. Of course, I could go back to the time when I was forced to use microfilm and read through the entire collection of records to find one name. The time and effort spent in searching record collections severely limited the amount of information I was able to gather in any reasonable time period. In addition, of course, I was faced with the issue of deciphering all of the handwritten records. The results of these conditions were that very little research was produced in any particular time spent. 

 Full-text search opens up the ability to review hundreds, perhaps thousands of records in a matter of minutes, but the catch is the fact that you need to guess the search terms to use to open up the information. Let's say your ancestor's name was John. Here is a list, generated by Google Gemini, of the variations of the name John. 

The name John is one of the most enduring and prolific names in history. It originates from the Hebrew name Yohanan (יוֹחָנָן), meaning "Yahweh is gracious."

Because of its biblical significance—associated primarily with John the Baptist and John the Apostle—the name spread through the Byzantine Empire, into the Latin-speaking West, and eventually across the globe through various linguistic shifts.


Linguistic Evolution

The path of the name from its ancient roots to modern English follows a clear phonological chain:

  1. Hebrew: Yohanan

  2. Greek: Ioannes (Ἰωάννης)

  3. Latin: Iohannes / Johannes

  4. Old French: Jehan

  5. Middle English: Jan / John


Variations by Region and Language

Western & Northern Europe

These variations reflect the Germanic and Romance linguistic families where the name has remained a top choice for centuries.

  • English: John, Jack, Johnny, Jon

  • French: Jean, Yannick (Breton)

  • Spanish: Juan, Juanito

  • Portuguese: João

  • Italian: Giovanni, Gianni, Nino, Vanni

  • German: Johannes, Hans, Jan, Johann

  • Dutch: Jan, Johan, Hanne, Hans

  • Scandinavian (Danish/Swedish/Norwegian): Johan, Jan, Jens, Hans

The British Isles (Celtic Variations)

The Celtic languages developed distinct versions of the name that have since become globally popular in their own right.

  • Irish: Seán, Shane, Shaun, Shawn

  • Scottish Gaelic: Iain, Ian

  • Welsh: Evan, Ioan, Ieuan, Siôn

  • Manx: Juan

Eastern Europe & Eurasia

In Slavic and Orthodox traditions, the name often maintains the "I" or "Y" sound from the original Greek Ioannes.

  • Russian: Ivan, Vanya

  • Polish: Jan, Janusz

  • Czech/Slovak: Jan, Ján, Janko

  • Hungarian: János, Jancsi

  • Romanian: Ion, Ioan, Ionuț, Nelu

  • Bulgarian/Serbian: Ivan, Jovan

  • Greek: Ioannis, Giannis, Yannis

Middle East & Africa

These versions often stem directly from the Hebrew original or the Islamic tradition.

  • Arabic: Yahya (يحيا), Yuhanna (يوهنا)

  • Hebrew: Yohanan (modern: Yochanan)

  • Amharic (Ethiopia): Yohannes

  • Turkish: Yahya

Asia & Pacific

In these regions, the name is often adopted through religious conversion or phonological adaptation of Western names.

  • Chinese: Yuēhàn (約翰)

  • Japanese: Yohane (ヨハネ - Biblical), Jon (ジョン)

  • Korean: Yohan (요한)

  • Hawaiian: Keoni


Diminutives and Medieval Short Forms

Historically, many surnames were created from pet names or shortened versions of John.

  • Hank: Derived from the Dutch Hanne.

  • Jan: Common in Northern Europe; used as a root for many surnames.

  • Jenkin: A medieval English diminutive ("Little John").

  • Hick/Hitch: Obsolete medieval English rhyming nicknames for John.


Summary Table of Major Forms

LanguagePrimary FormCommon Diminutive
EnglishJohnJack
SpanishJuanJuanito
RussianIvanVanya
GermanJohannesHans
ItalianGiovanniGianni
IrishSeánShane
ScottishIanIain
FinnishJukkaJani
 Which one of all of these terms was the one used by your ancestor named John? Did he use the name John at all, or did he use some other name, such as Bubba or Kid or J.T.? So when you are faced with a search field such as this one from FamilySearch, what are you going to use for the search terms?


If you assume that the person's name was John, what are your chances of finding him if he went by one of the other names?  For example, my great-grandfather's official name was Henry Martin Tanner, but when he signed legal documents, such as deeds, he always used Henry M. Tanner. Full-text searches are rather literal, and if I search for Henry Tanner. I will possibly not find Henry M Tanner.  I can use all sorts of Boolean algebraic terms, but I will still face the same problems of determining the search terms I need to use to find any specific piece of information I am searching for. Another example: one of my relatives is named Joseph Christiansen. His grave marker says Joe Christiansen. He apparently did not like to be called Joseph. How am I supposed to know this?

The people programming full-text search could add the variations for all the names and all the places and practically everything else into their program. They might even implement artificial AI to recognize all of the variations. What happens in that circumstance is that the number of documents discovered by AI can run into the millions.

Do I have a solution for this? No, but I have a methodology I use to attempt to narrow down the number of possible variations. This primarily includes carefully reviewing the documents that I do find to discover the possible limited variations of the name used by the person I am searching for. This whole process also applies to place names and, to some extent, to dates, particularly when you think about calendar changes.

This is part one of this particular series of articles, and hopefully you will stay with me and read the rest of the series as it comes out over the next few weeks.