Some people eat, sleep and chew gum, I do genealogy and write...

Thursday, April 30, 2026

How to Evaluate an AI Website for Doing Genealogical Research


The proliferation of AI platforms since the launch of ChatGPT in late 2022 has triggered a surge in promotional content and comparative reviews across YouTube and various blogs. Most of these evaluations conclude with a subjective recommendation based on the author’s personal preference. While I hold my own biases, this article provides a standardized framework of criteria to help you determine if a reviewer’s claims are substantiated or merely anecdotal.

A suggested methodology for determining an accurate review and its applicability to genealogical research can be gleaned from articles and educational opportunities such as the following:

“Advanced AI Techniques for Genealogists: Expanding Your Research Skills.” GRIP Genealogy Institute, January 8, 2024. https://grip.ngsgenealogy.org/courses/advanced-ai-techniques-for-genealogists-expanding-your-research-skills/.

Apple Podcasts. “RootsTech Class 2026 Class Takeaways.” Accessed April 30, 2026. https://podcasts.apple.com/us/podcast/rootstech-class-2026-class-takeaways/id1419782085?i=1000759494832.

BYU Library Family History Center. Developing an Ethical and Safe Use of AI for Genealogy -James Tanner (22 Feb 2026). 2026. 55:14. https://www.youtube.com/watch?v=R0bfWAYx-OE.

Coalition for Responsible AI in Genealogy. n.d. Accessed April 30, 2026. https://craigen.org/.

“Ethics and Best Practice of AI Use in Genealogy Research - NZ Society of Genealogists.” Accessed April 30, 2026. https://genealogy.org.nz/Ethics--Best-Practice-of-AI/11482/.

Ferris, Maureen Martin. “AI in Genealogy | Maureen Martin Ferris.” Accessed April 30, 2026. https://www.maureenmartinferris.com.au/ai.html.

“How to Use AI Tools for Family History Research | The Gazette.” Accessed April 30, 2026. https://www.thegazette.co.uk/all-notices/content/104452.

Navigating the AI Frontier: Why Your Genealogy Society Needs a Policy (and How to Write One!) - GenSocSoup. Genealogy Society Management. January 28, 2026. https://gensocsoup.com/navigating-the-ai-frontier/.

I used Google Gemini to evaluate and review these and other sources. 

Evaluating the utility of General Purpose AI (like Gemini or ChatGPT) versus specialized genealogy tools (such as the online family tree websites) requires a shift from "search-based" thinking to "analysis-based" thinking. In 2026, as these models move toward sophisticated reasoning rather than simple text prediction, the criteria for their value can be divided into technical capability (value) and workflow efficiency (usability).

For any genealogist, the Genealogical Proof Standard (GPS) is the leading standard. When evaluating an AI, you must determine how well it supports this framework.

  • Source Grounding: Does the AI provide specific citations to the uploaded documents? A valuable AI doesn't just say "John Smith was born in 1840"; it points to the specific line in the PDF or census record.

  • Hallucination Rate: Does the model "invent" ancestors to fill gaps in a pedigree? Testing a model with a known, well-documented family line is essential to benchmark its tendency to hallucinate.

  • Logical Reasoning: Can the AI resolve conflicting evidence? For example, if one record says a birth was in 1842 and another says 1845, a high-value model should be able to weigh the reliability of the sources (e.g., a birth certificate vs. a 1910 census) rather than just picking one.

The efficacy of AI interaction is dictated by the quality of the prompt. Consequently, assessing the validity of any AI evaluation requires a rigorous review of the underlying prompts used for comparison. Because an AI’s output is a direct reflection of the parameters formalized in the prompt, any conclusion regarding a model's utility is inherently tied to whether the prompt inadvertently biased the results or predetermined the comparison's outcome.

Since genealogical research is heavily document-dependent, the AI’s ability to "see" and "interpret" is a primary value criterion. The basis for these interpretations weighs heavily on the AI's ability in OCR or Handwriting Recognition, Data Structuring, and its Context Window ( the actual usable tokens).  it is always important to check the latest comparison charts. See https://exploreaitogether.com/llm-usage-limits-comparison/

Here's a summary chart made by Google Gemini.

Value vs. Usability Comparison Table

CriterionValue (Does it do the job?)Usability (Is it easy to use?)
TranscriptionCan it read difficult 18th-century script?Is the interface for correcting the text intuitive?
TranslationIs the translation idiomatic and historically accurate?Does it preserve the original document's formatting?
SynthesisCan it spot a migration pattern across 10 documents?Can it output that pattern as a map or a timeline?
AnalysisCan it identify "same-name" individuals as different people?Does it explain why it made that distinction clearly?
This will probably be an ongoing issue. 


No comments:

Post a Comment