Some people eat, sleep and chew gum, I do genealogy and write...

Wednesday, December 27, 2023

Who owns government records? Work Product, Ancestry, Reclaim the Records, Freedom of Information Acts, Copyright, and lots of other issues.


I recently ran across three news articles focusing on the efforts of ReclaimTheRecords.org to obtain copies of public records in Maryland and Pennsylvania. The issues discussed in the articles can be summarized with two questions:

Can a state acting through any state agent or agency contract to sell exclusive rights of access to what would otherwise be public documents and records?

When a third-party contractor pays for or is otherwise given permission to digitize public records, does the third-party contractor accrue any proprietary right due to any theory of its claim to ownership through work product?

Here are links to the three articles. 

Reclaim The Records. “The Maryland Motherlode: Births, Marriages, Deaths, and Naturalizations.” Accessed December 25, 2023. https://www.reclaimtherecords.org/records-request/31/.

Moyer, Justin Wm. “How Genealogists Got Millions of Md. Records Online for All to See.” Washington Post, December 24, 2023. https://www.washingtonpost.com/dc-md-va/2023/12/25/maryland-genealogical-records/.

PennLive.com, Spotlight PA | For. “Inside the Pa. Court Case Pitting a Genealogist against Ancestry.Com.” pennlive, December 25, 2023. https://www.pennlive.com/news/2023/12/inside-the-pa-court-case-pitting-a-genealogist-against-ancestrycom.html

These and many other issues are raised in the context of the efforts of Ancestry.com and other companies to assert ownership over public domain works and public documents and records. A particularly egregious example is when a publisher or an online graphics website "republishes" works that are clearly in the public domain and then pretends to have copyright in order to charge for copies. Another example that is closer to genealogist's interest is when a large genealogy company "buys" the right to digitized records and then charges a fee to view those same records by asserting either a copyright interest or a "work product" interest or both. 

Genealogists benefit from the more available digital copies but the conflict comes when the same records supplied to the large genealogy company are otherwise public records and should be freely available to the public under almost all state Freedom of Information Acts. all 50 states and the District of Columbia have freedom of information laws. These laws are also known as Sunshine Laws, Public Records Laws, and Open Records Laws but those laws are meaningless if the state sells the right to control access to the covered records to a third party company that then claims ownership of the documents. 

Billions of records are presently in this category. One illustrative example of the manufactured complexity of this issue can be viewed in Ancestry.com's 7,166 words long Terms and Conditions. See https://www.ancestry.com/c/legal/termsandconditions#:~:text=You%20agree%20that%20you%20will,your%20use%20of%20the%20Services You might be surprised and perhaps concerned about your own personal liability for using the Ancestry.com website. I should also add that every other large online genealogy website has similar terms and conditions. You might also want to look at section 3.2 of the Terms and Conditions about Ancestry.com's use of what information you supply to the website.

Paragraph 2.1 of the Ancestry.com Terms and Conditions, entitled Intellectual Property Rights to Ancestry Content, simply restates basic copyright law. It is also interesting that despite the fact that the vast majority of the actual records hosted by Ancestry.com are not subject to copyright claims, a copyright notice is placed on each collection of records including U.S. Census Records. 

There is no mention of any claim to "work product" in any part of the Ancestry.com website. This is not surprising since the only legal meaning of the term "work product" is as follows:

The legal term “work product” refers to materials such as writings, notes, memoranda, reports on conversations with the client or witness, research, and confidential materials that an attorney has developed in anticipation of litigation or for trial.

Work product is generally privileged, meaning it is exempt from discovery. However, there are exceptions1. Work product is divided into two categories: ordinary and opinion.

Ordinary work product is the result of gathering basic facts or conducting interviews with witnesses, and is discoverable if there is a showing of substantial need, like a witness that becomes unavailable.

Opinion work product is the record of an attorney’s mental impressions, ideas or strategies, and is almost never subject to discovery.

For reference see https://www.law.cornell.edu/wex/work_product

Despite the legal issues involved, genealogists benefit from the online access of records from around the world. But it is sad that both governments and some large online genealogy databases claim ownership of otherwise public domain or public records merely from having paid to digitize the records under claim of contracts. 

More about this later.


Tuesday, December 26, 2023

MyHeritage Releases AI Record Finder™ and AI Biographer™ — Two Groundbreaking Features That Transform Genealogy Using Artificial Intelligence

 

If you have been following the online news about artificial intelligence or AI, you will already know that AI applications are expanding at a astronomical rate. Some of the AI features that have been implemented over the past few years by MyHeritage.com include Record Matches, Smart Matching™, DNA tools and a bundle of photo enhancement programs. But now, there is a giant leap in even more sophisticated chatbot features for MyHeritage.com. I will be presenting three live classes at RootsTech 2024 on "Using Artificial Intelligence Tools to Expand Your Genealogical Research Universe." You can see from this announcement, by February 29th, 2024, I will probably have a lot more to talk about than I had previously began planning. See RootsTech.org. The schedule of the classes will be posted in the next few weeks. 

I fortunately had a sneak preview of the features and now they have been announced. Here are some of the features as set out in an email to me for release to the public.

TEL AVIV, Israel & LEHI, Utah, December 27, 2023 — MyHeritage, the leading global family history service, announced today the release of two groundbreaking features that mark the next frontier in family history research: AI Record Finder™ and AI Biographer™. AI Record Finder™ revolutionizes genealogy like ChatGPT revolutionized searching the internet: it is an interactive, intelligent, free-text chat to help the user locate relevant historical records about a person of interest in MyHeritage’s vast database of 20 billion records. AI Biographer™ automatically compiles a rich narrative about an individual’s life using information from historical records that match the person, creating a Wikipedia-like biography about anyone. Narratives are enriched with relevant historical context using AI and are easy to share. MyHeritage is the only service to offer such groundbreaking features for family history, and the first to leverage conversational AI for searching historical records. The two features are integrated, allowing users to generate an AI Biography™ for individuals they find using AI Record Finder™. AI Biographies™ may also be generated directly for individuals in family trees on MyHeritage.

It has not been hard to predict using chatbot technology for a data intensive pursuit such as genealogy. The only issue once the landslide of chatbots became available in the last year was when it would happen. It is also predictable that MyHeritage.com the technological leader of the large, online genealogy websites would be the first to implement chatbots. 

Here are some more detailed decriptions of the two new products from my email notification. 

AI Record FinderTM

Until now, searching for historical records on online genealogy platforms like MyHeritage has been very similar to using a regular internet search engine. One entered names and other terms into dedicated fields in a search form, and the search engine returned a large number of search results. Then, it was necessary to comb through the results to discover relevant information. AI Record Finder™ transforms this experience by enabling users to converse with an AI assistant in a chat to quickly find records about their ancestors, relatives, or other deceased individuals. Users can still use the traditional search engine on MyHeritage, but AI Record Finder™ adds an additional chat mode that increases the chances that users may be able to find elusive records they have never found before, thanks to the power of AI.

The chat is like an interview with a friendly concierge that the user can converse with in one of two modes: casual or formal. AI Record Finder™ processes the information the user enters, and understands what additional details are necessary to help narrow down the search results. It guides the user by asking the relevant questions according to the context and information provided by the user, to find the most relevant records about the person the user is searching for. Once located, the records can be reviewed and the details saved to the user’s family tree. AI Record Finder™ includes a seamless user interface, where historical records that are found appear directly within the chat.

AI BiographerTM

AI Biographer™ creates a rich Wikipedia-like biography summarizing a person’s life. This is especially useful for creating biographies about the billions of individuals who were not famous, and therefore do not appear in Wikipedia. An AI Biography™ can be created from historical records found via AI Record Finder™ and for deceased individuals within a user’s family tree on MyHeritage. AI Biographer™ utilizes MyHeritage’s acclaimed matching technologies to curate historical records and family tree profiles that pertain to the selected individual. All information from the pertinent records is then compiled into an biography that is enriched with photos and scanned documents, and in some cases, additional information from the web. The resulting biography includes the person’s immediate family, describes the main events of their life, and includes rich historical context and the origins of their surname. Each biography is a unique narrative that can be shared with family and friends, and saved for posterity. Facts listed in AI Biographies™ include footnotes and source citations, and link to the records from which they were obtained. Any inconsistencies within the information listed are noted. AI Biographies™ are saved as PDF files that are emailed to the user.

When created from the user’s family tree, an AI Biography™ is added to the family tree as a media item and tagged with the individual’s name, so that it is accessible through the MyHeritage mobile app and Family Tree Builder desktop software. The biography is included whenever the family tree is exported in GEDCOM format, ensuring that the enriched biographical information remains an integral part of the family tree. Biographies can easily be regenerated whenever new information becomes available. Additional entry points for generating an AI Biography™ such as from MyHeritage’s traditional form-based search engine, and from family tree profile pages, will be added soon.

“We’re constantly pushing the boundaries of genealogy to reinvent the way people can discover their family history as we implement a bold vision for genealogy in the 21st century” said Gilad Japhet, Founder and CEO of MyHeritage. “AI Record Finder™ is a disruptive feature that simplifies the way people can find information about their ancestors by making the search easier and more intuitive. AI Biographer™ curates the details about a person’s life into a compelling story. Not all our ancestors were famous, but they all deserve to be remembered! Together, these cutting-edge features strengthen MyHeritage’s position as the industry leader for innovative genealogy and continue our mission to make family history easier, more accessible, and more fun for everyone.”

AI Record Finder™ and AI Biographer™ both use automated third-party technology powered by OpenAI.

Availability, Cost, and Language Support

AI Record Finder™ and AI Biographer™ are currently accessible from desktop and mobile web browsers. Support for both features on the MyHeritage mobile app will be added soon.

AI Record Finder™ is free for limited use. To submit an unlimited number of chat messages, and to view and save historical records to the family tree, a Data or Complete subscription is required. Users can create a few AI Biographies™ for free. Beyond that, additional use of AI Biographer™ requires a Complete subscription.

AI Record Finder™ and AI Biographer™ are initially available in English and will support additional languages in the near future. It is possible to converse with AI Record Finder™ in multiple languages, but at launch, it responds in English only. 

Saturday, December 16, 2023

RootsTech is Coming! You will want to come to the live conference

 

https://www.familysearch.org/en/rootstech/

We are quickly coming to the end of 2023 and now RootsTech February 29th through March 2nd is right around the corner. If you have never been to a genealoogy conference, RootsTech 2024 will be your chance to come to largest and most memorable genealogy conference of all time. 

RootsTech 2024 is the premier event to celebrate your heritage and other meaningful connections through a deeper understanding of family history and genealogy1. Here are some reasons why you should attend:

  • Exclusive Sessions: Over 250 exclusive sessions are only available in Salt Lake City.
  • Expo Hall: More than 120 exhibitors/sponsors will be present in the Expo Hall.
  • Industry Innovations: Be the first to learn about industry innovations.
  • Networking: Develop new friendships and reunite with old friends.
  • Personalized Help: Get personalized help at the FamilySearch library.
  • Keynote Speakers: Hear from various talents from industries around the world who share their own family experiences and inspiring messages of hope and resiliency.

RootsTech 2024 is gearing up for a special year with the theme “Remember”, highlighting the essence of RootsTech, which is honoring and cherishing our families and ourselves while creating new relationships that transcend time. So, come join us and discover your story at RootsTech 20241 Register today to save your spot.

Here is the link to register https://www.familysearch.org/en/rootstech/


Why is the FamilySearch tree an unmoderated wiki and what happens because it is not moderated?

 

Imagine a major city with no traffic rules, no traffic control devices, and no policemen. This would probably seem to be ideal for an ararchist. So why would you think that a complex wiki program or app would not eventually end up chaotic also? The FamilySearch.org Family Tree is a wiki-based program or app. 

A wiki is a form of online hypertext publication, collaboratively edited and managed by its own audience, using a web browser. It typically contains multiple pages for the subjects or scope of the project, and could be either open to the public or limited to use within an organization for maintaining its internal knowledge base. (quote from Bing Chat)

A moderated wiki is a type of wiki where changes and contributions are reviewed by designated moderators or administrators before they are published. This process helps to ensure that the content aligns with the wiki’s guidelines and standards. The purpose of content moderation is to remove or apply a warning label to problematic content or allow users to block and filter content themselves. Major platforms use a combination of algorithmic tools, user reporting, and human review. Is the FamilySearch Family Tree a "moderated wiki?" 

Here are the general guidelines for using the FamilySearch Family Tree:

  • Appropriate Content: Content should support appropriate standards of modesty and virtue.
  • Relevance: Content should support a family history purpose.
  • Heart-turning: Content should support individuals coming to know and love their ancestors.
  • Noncommercial: Content should not advertise or promote products.
  • Intellectual Property Rights: They should not infringe on intellectual property rights.
  • Accuracy: Photos, Documents, and Audio Recordings may not be edited in such a way as to make them inaccurate, false, or misleading.

The glaring failure of the list and therefore the Family Tree is the lack of any sort of external moderation. This lack allows millions of entries to be added with no review or moderation at all. The idea of using a wiki format for the Family Tree was sound and valuable. But allowing the Family Tree to be changed on the whim of a user has lead to wholesale duplication, inaccuracies, and lack of reliability. There are significant numbers of potential users who refuse to use the Family Tree to store their own genealogical information or stop using the Family Tree because there are really no restrictions on the accuracy of the content. 

Two very damaging ways that wholesale duplicates and inaccurate information is being added to the Family Tree include projects that add millions of names without providing a minimum of supervision as to duplication or accuracies and the ability of any and all users to upload unsupervised GEDCOM files. 

I am not going to take the time in this post to review all of the possible, previously proposed modration suggestions that have been made over the years because to do so would essentially be a waste of time. 

Back in the 1960s and onward, FamilySearch or its predecessor the Genealogical Society of Utah, sponsored vast extraction programs where records were add to the existing data bases such as the International Genealogical Index and the Ancestral File with no limits on duplication of entries to an individual person. From my own personal ancestral lines, this allowed the same information about some of my ancestors to be added to the Family Tree hundreds of times. The present situation is no different with some areas of the Family Tree such as ancestors in New England being changed and duplicated sometimes dozens of times a week or even many times every day. This rampant lack of moderation or control results in what I call "revolving door ancestors" and futher results in my abandoning any research or additions to any one of my New England ancestral lines. Many of the bad entries and some of the corrections are being done by unresponsive and in many cases anonymous users. Those who do the research and try to get these people to add sources or even collaborate are frequently ignored. Some of these people are notorious for their disregard for propriety. 

The common user solution to the problem is to abandon adding information to the Family Tree and moving to an individually owned family tree either online or in a desktop programs. 

The basic motivation seems to be adding the numbers of entries while disregarding any attempts and limiting duplication or inaccuracy. Leaving the process of moderation entirely to the users results in some users spending more time correcting existing entries than actually doing the research needed to add new entries. 

If you need a prime example of this lack of control, here are a few individuals to look at with hundreds of changes. 
  • Dvid Kenyon I KNQL-7VM
  • John Kenyon II KNH4-2LX with 14 changes in the last two weeks
  • John Kenyon 273D-VZ6 with only four sources and 22 changes in the last two months
  • Philip Taber Jr. 945B-5CS with 28 sources but probably more than 200+ changes
  • Lydia Masters 9XPZ-KMZ with 13 sources but 9 changes in the last week and possibly hundreds of cumulative changes. 
The list could go on and on. The amount of time wasted on these revolving door entries is probably into the millions of hours. 

I could also spend a great deal of time explaining exact why and how this situation exists. The problem is that many really good genealogists have quit using the Family Tree or are close to quitting. I have chosen to ignore any entry that shows a tendency to change frequently. I no longer care if those entries are accurate or not. Meanwhile almost entirely ignores these entries and continues to allow wholesale addition of millions of duplicates. I spend a significant time merging duplicates that officially do not exist. 

If you have read this far, you probably know exactly what I am writing about. Can the Family Tree continue exist despite this condition? Yes, if it used merely as a dumping gound but it will also continue to lose confidence in its reliability as a place to do real genealogy. 

Do I need to list all the times I have written about this subject? By the way, I have been and continue to be an ardent supporter of FamilySearch and the Family Tree. I just wish there were some movement towards controlling the uncontrolled. 

Monday, December 11, 2023

Reclaim the Records Liberates Millions of Records from the State of Maryland

 

https://archive.org/details/maryland-state-archives?sort=title

Because this is so important, I am going to copy some of the text of the above email. Here it is. 

GOOD MORNING, BALTIMORE!

RECLAIM THE RECORDS PUTS MILLIONS OF MARYLAND BIRTH, MARRIAGE, DEATH, AND NATURALIZATION RECORDS ONLINE (MANY OF THEM NEVER AVAILABLE BEFORE ANYWHERE) and yes this time we got FULL VITAL RECORDS CERTIFICATES too

Hi. Please excuse the all-caps, but we're currently hyped up on a sugar high from the pumpkin pie, and a records-high from OVER A HUNDRED YEARS OF NEW AND TOTALLY FREE GENEALOGY RECORDS THAT WE JUST PUT ONLINE and we're all pretty darn excited.

Ahem. We at Reclaim The Records are so proud to finally announce one of our largest record acquisitions to date: millions of vital records spanning over one hundred years of history for the state of Maryland.

These records have never previously been publicly available online anywhere else — not on FamilySearch and not on Ancestry and not on MyHeritage and not on [insert some other genealogy website here] — except for some records that had only been available at the Maryland State Archives' internal website, if you happened to be sitting in their building in Annapolis and using their in-house computers, or on their external website, but only if those records were more than a hundred years old.

This announcement is groundbreaking for us at RTR. Not only is this an unusually large cache of materials for one of our records projects, but this time, our acquisition was not limited to a basic name and date index — although we did get those, too! — but in addition to the decades of vital records indices, we also got the digital images of the actual birth, marriage, and death certificates for the state of Maryland. Yep, the real certificates. And now we've put them online, free!

Because my wife and I served as missionaries with FamilySearch.org in digitizing records from the Maryland State Archive, we are extremely happy to see more of the records being made available outside of the Maryland State Archives. 

See more liberated records on the Reclaim the Records website.

The Records are now freely available on Archive.org, the Internet Archive. See https://archive.org/details/maryland-state-archives?sort=title 

All of the records on the Internet Archive or archive.org are searchable by Google. 

Welcome to the Brand New MyHeritage Wiki

 

https://www.myheritage.com/wiki/Home


https://youtu.be/hPxAfN3qFOA?si=L4_6bnZT-9NXZG1g

During this past year, it was my honor and privilege to assist in developing the new MyHeritage Wiki, along with other talented writers and software developers. As with all wikis, you have to start with a concept and a design. The new MyHeritage wiki has both an outstanding concept and clean an uncluttered design. Take some time to explore the content and rest assured that there will be more content. There is a simple was to apply to be a contributor also. I am sure there will be a lot more I will be writing about this very useful addition to the greater genealogical, worldwide, community. 

Saturday, December 2, 2023

10 Million Names Project from AmericanAncestors.org

 

https://10millionnames.org/

The 10 Million Names is a collaborative project that includes many prominent genealogical and academic organizations. See https://10millionnames.org/collaborators. The objective of the project is described in the Project's Mission Statement.

10 Million Names is a collaborative project dedicated to recovering the names of the estimated 10 million men, women, and children of African descent who were enslaved in pre- and post-colonial America (specifically, the territory that would become the United States) between the 1500s and 1865.

The project seeks to amplify the voices of people who have been telling their family stories for centuries, connect researchers and data partners with people seeking answers to family history questions, and expand access to data, resources, and information about enslaved African Americans.

The project originated through the efforts of AmericanAncestors.org, a genealogical research website and resource provided by the New England Historic Genealogical Society (NEHGS). NEHGS is one of the oldest and largest genealogical societies in the United States. Here is a statement about the involvement of collaborators from the 10 Million Names website. See https://10millionnames.org/frequently-asked-questions-0

American Ancestors, a nonprofit center for the study of family history, heritage, and culture, founded in 1845—the country’s oldest genealogical institution—has undertaken this project in collaboration with organizations, individuals, and scholars dedicated to African American history and genealogy. Collaborative partners include the Afro-American Historical and Genealogical Society, FamilySearch, the New Bedford Historical Society, and Daughters of the American Revolution.

During the time my wife and I were serving as Church Service Missionaries for The Church of Jesus Christ of Latter-day Saints and helping FamilySearch.org to digitize records at the Maryland State Archives, in Annapolis, Maryland, we came face to face with magnitude of the endeavor to identify former enslaved people through our efforts to digitize Maryland Probate Records. Here is a sample page from a probate file that shows the inventory of an enslaver. 



 You can clearly see the enslaved people listed along with the oxen, cows, and sheep. Day after day, we were confronted with the reality of slavery. At times, we were overcome with grief for the enslaved people. I think it is more than important, I think it is imperative that we document every one of these people. 

Think about it. 

Monday, November 27, 2023

Cyber Monday Sale for MyHeritage DNA

 

https://www.myheritage.com/dna/562098391

MyHeritage DNA kits are still at the unprecedented price of $33 — but not for long! Make sure none of your followers miss this rare opportunity to purchase a DNA kit at the lowest price EVER.

MyHeritage.com is a major genealogy/DNA company in Europe and has a huge DNA testing base around the world. 

Sunday, November 26, 2023

5 significant issues on the FamilySearch.org website

 


It seems like I am involved one way or another with the FamilySearch.org website nearly every day. Subsequently, I have plenty of time to think about all the things that my own background and experience would change or improve. Here are five things that are likely not news to FamilySearch.org, and are probably things that you may have run across even if you did not view them as issues. 

1. The invisible Images section. 

The historical records on the FamilySearch.org website are searchable in three completely different sections. The first set of Historical Records is found on the dropdown Search Menu.  These Historical Records, https://www.familysearch.org/search/, let you search by ancestor, but are limited to mostly indexed records with some unindexed records scattered in. Of course, the number of indexed records increases every day due to the contributions of volunteer indexers, but it seems that the only available report of the total number of indexed records is recorded as "more than one billion." The number of indexed records added to the website each week is reported a blog post entitled, A FamilySearch Monthly Record Update. in the FamilySearch Blog linked from a list of web pages on bottom of the pages on the website. One thing you can see on this list is the number of indexed records from computer-aided indexes or CAI. 

The second set of records are found in the FamilySearch Catalog. The Catalog contains both indexed and unindexed records. The unindexed records which are best searched by country, must be searched page-by-page unless there is some sort of index that was included with the historical record itself. 

The third and last set of records, the invisible Images Section, or those that are not yet indexed or cataloged and possibly make up the bulk of all the records on the website. The number is likely somewhat more that five billion records. FamilySearch is presently using the Computer Aided Indexing to add millions of newly indexed records to the website. See  for example, https://www.familysearch.org/en/blog/new-records-29-october-2023. when the records are indexed they appear in the Historical Records collection. 

Where are these unindexed and not cataloged record? The are piled in the Explore Historical Images section of the website that is called Images under the dropdown Search Menu at the top of each page. Almost uniformly the people I talk to about the FamilySearch website do not know this set of records exists. Most people assume that when they do a name search they are searching all the records. As I pointed out above, the number of these "invisible" records is huge, possibly well over 5.2 billion. See https://www.familysearch.org/records/images/beta 

My question is why isn't there more visibility and utility for the Image records? My other question is why doesn't FamilySearch explain what is and what is not available in each of the three searches; Historical Records, the Catalog, and Explore Historical Images sections of the website? For example, the "Historical Records" could contain a notice that said that a name search only searches indexed records which constitute on an "X" percentage of the records on the website. 

2. Two FamilySearch Catalogs?

Yes, you can search the "old" catalog under the dropdown Search menu, but, by the way, this catalog has not been updated for the past year. Now, there is a second FamilySearch Catalog called The FamilySearch Library Catalog. Where is this catalog? Is it somewhere on the website? Yes, but so far, I have not found a link. Here is the URL, https://www.familysearch.org/en/library/our-catalogs. Interestingly, the webpage is entitled FamilySearch Library Catalog with a link to "Our Catalogs." 

Yes, there are two FamilySearch catalogs on the same website. But there are no apparent links to the second catalog except if you know the link already.

3. The missing parts of the FamilySearch website.

Sometime, just for fun, you can try searching for a topic or an area of genealogical interest on Google using FamilySearch as part of the search. Here is a link for example. https://www.familysearch.org/campaign/pioneers
Try another example, such as WWII. If you look down through all the entries you will find a lot of references to the FamilySearch blog. This is not quite invisible. It is linked from the bottom of each web page. You will find links to the blog from the Google search and to the Research Wiki, and also to the Catalog entries. I realize that there is only a limited amount of space on any web page, but maybe some of the items in the site map deserve to be linked a little more prominently. My favorite one is the England &Wales Jurisdictions Map that is also linked from the Research Wiki and is in the Site Map, if you know where to look for the Site Map. See https://www.familysearch.org/mapp/ (Yes, with two "pps")

Here are two more links to other "missing" parts of the website:

4. The Duplicate Fire Swamp.

The duplicate issue has been out-of-control since the day the FamilySearch.org Family Tree went online. It is true that FamilySearch got rid of a mountain of duplicates, but now with artificial intelligence, they should be able to get rid of the huge mountain left. The obvious duplicates for those of us who are doing English and the rest of the British Isles research are the left-over duplicates from the early extraction program where each individual was individually extracted for a baptism record, a marriage record, and a burial record. If a family has ten children (not unusual) there are three automatic duplicates not including those recently entered by inexperienced users and bulk entries from census and other projects. So ten children plus the adults can result in find 36 duplicates more or less or 35 assuming you have found one of the duplicate entries and have not just stepped off into the swanp. 

5. The excessive revolving door ancestors. 

Just looking on the day of writing this post. Francis Cooke, a Mayflower passenger who is completely and exhaustively documented had an endless list of changes with a few in the last week. He also has 28 sources. I can no longer do any research into my New England lines because of they are all revolving doors and I do not have the time or the energy to keep correcting people who do not need to be changed in the first place. 

These issue may be resolved some day, but since I am old, I doubt I will see the day. 

Monday, November 20, 2023

Your Story is Worth Remembering, a RootsTech Film

 

Here is the link to the actual video. https://youtu.be/cSfNA86DIUM?si=Y-dBS1r5h54DywSE

Here is a quote from the YouTube notes for the video.

In a world that often celebrates the extraordinary, watch as five individuals who've led seemingly ordinary lives through their own eyes, are reminded of the indelible mark they’ve left on the hearts of their loved ones.  

  We filmed their family members answering the question, “What makes your parent/grandparent extraordinary. The responses we received were nothing short of breathtaking— personal memories, touching anecdotes, and deep reflections shared together. Through the lens of personal stories, we recognize the transformative power of personal memories and experiences, and remember the importance of cherishing and celebrating the unsung heroes in our lives. Everyone's story is worth remembering.  

 Learn more here https://www.familysearch.org/en/rootstech/

Friday, November 10, 2023

MyHeritage's PhotoDater™ is now available on the MyHeritage and Reimagine mobile apps

 


PhotoDater™ was first released on the MyHeritage.com website back on August 13, 2023. Quoting from the announcement blog post:
PhotoDater™ is one-of-a-kind: MyHeritage is the only genealogy service that offers date estimation for historical photos. Using powerful technology developed by our AI team, PhotoDater™ gives its best guess when a photo was taken. This can help you unlock further clues about who appears in the photo and the event at which it was taken, to solve mysteries in your genealogy research. PhotoDater™ is completely free!

You can read a detailed explanation of the MyHeritage.com PhotoDater™on this blog post, "Introducing PhotoDater™, an Exclusive, Free New Feature to Estimate When Old Photos Were Taken."

Now, this amazing app is available on both the MyHeritage and Reimagine apps. The apps are available from both the Apple App Store and from Google Play. 

 

Monday, November 6, 2023

MyHeritage DNA Testing Holiday Sale

 

Click on this link to order.

There are only a very few companies that often DNA testing who have large numbers of users and a huge database of records. MyHeritage.com has both an exceptionally large number of users and a database of  19,611,003,679 records. I currently have 16,183 DNA matches on MyHeritage.com. We have also solved family tradition mysteries using MyHeritage's database of people and matches. By the way, I helps to get DNA tests from more than one company. 


Friday, November 3, 2023

About Creating a No-source zone on the FamilySearch.org Family Tree

 

We are all acquainted with driving through areas where the speed limits change for safety reasons. I would like to see this concept applied to the FamilySearch.org Family Tree website. 

There is a natural conflict in genealogy between people with a casual interest who a just beginning to explore their family connections and those who a experienced. Traffic laws recognize that young students on their way to school need extra protection in crossing streets around schools. Children are also cautioned about crossing any street and are hopefully trained about the dangers of traffic. Equally, those who are new to genealogy are not automatically aware of the customs and procedures of the FamilySearch.og Family Tree website. 

Initially, it is important to know that the the FamilySearch.org Family Tree only works if it is treated as a source-centric family tree. This statement has been made numerous times over the years and is codified in FamilySearch publications. See the following list.

• FamilySearch. “View Sources in Family Tree • FamilySearch,” May 31, 2022. https://www.familysearch.org/en/help/helpcenter/article/how-do-i-view-sources-attached-to-my-ancestor-in-family-tree.
“A Short History of FamilySearch Family Tree.” Accessed November 3, 2023. http://www.ancestryinsider.org/2013/03/a-short-history-of-familysearch-family.html.
“Authentication - FamilySearch Developers — FamilySearch.Org.” Accessed November 3, 2023. https://www.familysearch.org/developers/docs/guides/implementation-cert.
FamilySearch GEDCOM. “FamilySearch GEDCOM Community.” Accessed November 3, 2023. https://gedcom.io/community/.
FamilySearch Wiki. “Tools for Using Family Tree/Search,” April 10, 2020. https://www.familysearch.org/en/wiki/Tools_for_using_family_tree/Search.
Jr, Bennett Cookson, Ken Boyer, James Mark Hamilton, Kendall J. Jefferson, Daren Thayne, and Michael J. Wolfgramm. Genealogy investigation and documentation systems and methods. European Union EP1550958A2, filed December 28, 2004, and issued July 6, 2005. https://patents.google.com/patent/EP1550958A2/en.
Seaver, Randy. “Dear Randy: Should I Use FamilySearch Family Tree as My Main Genealogy Database?” Accessed November 3, 2023. https://www.geneamusings.com/2021/11/dear-randy-should-i-use-familysearch.html.
Tanner, James. “Genealogy’s Star: Sources in FamilySearch Family Tree.” Genealogy’s Star (blog), August 13, 2012. https://genealogysstar.blogspot.com/2012/08/sources-in-familysearch-family-tree.html.
———. “Rejoice, and Be Exceeding Glad...: A Survival Guide for the FamilySearch Family Tree: Part Two -- The Scope of the Challenge.” Rejoice, and Be Exceeding Glad... (blog), May 19, 2018. https://rejoiceandbeexceedingglad.blogspot.com/2018/05/a-survival-guide-for-familysearch.html.

(By the way, I am fully aware that I am citing myself in two of the examples given above).

Unfortunately, as the FamilySearch.org Family Tree (hereinafter Family Tree) evolved, it has failed to maintain a structure that would support a source-centric content. What this means is that users can make entries in the Family Tree without providing a source for the information entered. There are likely millions of entries that look like the following:

There is a good argument for allowing "new" users (like young students attending an elementary school) to enter into the complex world of genealogy without providing a source showing where the information was obtained. It is likely that at least some users will have personal information and contact with their parents. From my extensive experience helping users from Spanish speaking countries, there are people who can list sometimes three generations from memory. My suggestion is that the first four generations of a new user's genealogy be part of a "No required source zone." However, I would also suggest that the website continue to emphasize the need for sources. 

A "source" in the context of genealogical research is a citation to the location where the information entered can be verified. The word "source" as used in the Family Tree is ambiguous in that it is used to refer to a historical document giving information about an entry and to anything showing where the information was obtained. Because of this ambiguity, an entry such as "Personal Information" is acceptable but useless for verification purposes. But as the example above illustrates, there are a huge number of entires that have no "sources." From my standpoint, I am forced to treat these entries as nonsense. 

If the Family Tree had this kind of record, it would then be appropriate to require any entries in past the fourth generation to have a source. Presently, the Family Tree has a notice when there is no source attached, but that notice apparently has no effect on those adding unsupported entries. I would further suggest that, as I have begun writing in other posts, that an AI assistant for users could help both new and experienced uses and help to elimate the vast number of duplicate entries, and un-supported entries. 


Winners of the Free RootsTech 2024 Passes

 

Back in October 2023, I posted a link to a sweepstakes offering a free pass to RootsTech 2024. Here are the winners as determined by RootsTech. Each of these people should have been contacted by RootsTech about their free pass. 

The early registration discount will expire on November 19, 2023. Here is the link to register. 

RootsTech 2024 Registration


Thursday, November 2, 2023

RootsTech 2024 Early Bird Pricing Ends November 19, 2023

 

Click here to register: https://www.familysearch.org/rootstech/

AI could help the FamilySearch.org new user experience

 

Over the years, I have probably helped a few thousand new users of the FamilySearch.org website. I have never viewed the website to be even slightly "user friendly." The recent addition of a new startup screen for those logging in the first time is a major first step towards rectifying this situation. 


However, the screen does not show again if your login in still in your computer's cache. You can compare this to the still in existence startup page that has been around for quite some time. 


Artificial Intelligence (AI) has recently been in the news. However, AI has been actively used by computer programmers for many years. The earliest successful AI program was written in 1951 by Christopher Strachey, later director of the Programming Research Group at the University of Oxford. See https://www.britannica.com/technology/artificial-intelligence/Alan-Turing-and-the-beginning-of-AI

Here is one of perhaps hundreds of quotes that could easily be found online about using AI to improve the user experience. 

AI has the potential to greatly improve website personalization, providing users with a more personalized and engaging online experience. By collecting and analyzing data about user behavior and preferences, AI algorithms can make personalized recommendations and tailor the website experience in real-time. See The Role of Artificial Intelligence in Website Personalization.

The FamilySearch.org website has a huge database of information. For example, the Research Wiki has over 100,000 helpful articles about how and where to find information for new and experienced users. When I talk with someone who is just starting out using the website, I almost always guide them to the Research Wiki. Almost uniformly, the new users are amazed, overjoyed, and mostly surprised that instructions about how to do genealogical research are free and readily available. When time permits and when someone shows interest in actually doing research, I also send them to The Family History Guide.  

The challenges of the Research Wiki and the rest of the FamilySearch.org website is that they are "static." The user must supply all the initiative to finding the resources in the website. My wife and I are tasked with teaching all the new missionary volunteers at the BYU Library Family History Center. We start with how to login to the website. Immersing the website in AI would allow new and experienced users to actually use the website without fighting with it. For example, I have to log onto the website sometimes dozens of times each day. Every time, I have to go through the process of logging on the same way - every time. What if when I opened the program, it recognized me (like by Subaru does)? What if the website then asked me if I wanted to return to the last page of the website I was using or wanted to do something else? What if there was also a way to opt out of the entire process and go directly to the Family Tree? What if the FamilySearch.org website tailored the website experience to my use of the website in real time?

Another issue with the FamilySearch.org website is the lack of internationalization. Some modest attempts a have been made so far, but the website is essentially my same English speaking experience even when I am looking at in Spanish and usually it is even sometime harder to use when I elect to view the website in Spanish, for example. AI is quickly becoming a way to translate text faster and more accurately. Why is the Research Wiki in Spanish a weak experience compared to the English version? 

Let's suppose that the website not only asked me what I wanted to do, but had a way for me to start a conversation about my research goals. This expertise to help new and experienced users exists with the pool of genealogists such as those at the BYU Library Family History Center who have years of experience. Why not transfer this pool of experience to a website. Guess what, that is already available and done. That is essentially what is the basis for The Family History Guide website. By the way, The Family History Guide website is now linked directly from FamilySearch.org. The Family History Guide is in essence the basis for an AI entry experience with the FamilySearch.org website. 

I will probably have a lot more to say about this subject in the near future. Stay tuned.

The New Improved FamilySearch.org Startup Page


Above is a screenshot of the vastly improved FamilySearch.org startup page. You can see from the startup page that a real effort has been made to engage new users of the program. If you sign into the account you will get your own personalized page but if you are a new user you can scroll down on the startup page to find out substantial information about the website. If you are registered, you will get the older startup page after you sign in. 

Here is an additional screenshot of the next scroll screen.




The links on the additional scroll pages provide navigational access to the website. In this screenshot, you will learn about the archive of historical documents that are available on the website. 

The next screenshot shows the next scroll down section of the startup page.


Here is the last segment of the startup page. 


I guess my comment would be, its about time. 

Tuesday, October 31, 2023

Lynne M. Jackson to be RootsTech 2024 Keynote Speaker

 

From an email notice from RootsTech 2024:

RootsTech by FamilySearch is honored to announce its first keynote speaker, president and founder of the Dred Scott Foundation and great-great-granddaughter of Dred and Harriet Scott, Lynne M. Jackson.

A remarkable woman, Lynne Jackson will take the RootsTech main stage on Friday, March 1, 2024, to speak on the importance of remembering and connecting with ancestors, touching upon the story of her great-great-grandparents, Dred and Harriet Scott, and how their legacy has shaped her life.

Also remember:


 Click this link to register: https://www.familysearch.org/rootstech/

Friday, October 20, 2023

Artificial Intelligence: Is the cat out of the bag?

 

By PawełMM - Own work, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=76953571

The full question here includes whether the bag was already empty. There seems to be an invalid assumption that the current developments in artificial intelligence are somehow revolutionary rather than evolutionary. If you or anyone understands and knows about the history of the development of computers and programming over the past 100 years or so, you would not be surprised by the current developments or unduly concerned. Yes, there are concerns, but these concerns are also not "new." They are merely repeating what has been said about industrialization from the time it began in about 1830. Here is a short history of the basis for the evolutionary developments in AI. The problem lies with the people who use the information.

The basis for the ability of artificial intelligence to create text and images comes from large language models (LLM). Here is a short summary of the history of LLMs from Bing, “A Summary of the History of Large Language Models”, 2023. Retrieved from Bing on October 19, 2023. Note the list of sources provided by the search. 

Large language models (LLMs) are neural networks that can process and generate natural language using massive amounts of data and computational resources. They have evolved from the early attempts to create rule-based systems that could mimic human conversation, such as Eliza in the 1960s, to the modern models that can perform a wide range of tasks, such as GPT-4 and Google Bard. 

The development of LLMs has been driven by several key innovations in natural language processing (NLP), such as:

  • The introduction of Long Short-Term Memory (LSTM) networks in 1997, which enabled deeper and more complex neural networks that could handle longer sequences of data.
  • The creation of Stanford’s CoreNLP suite in 2010, which provided a set of tools and algorithms for various NLP tasks, such as sentiment analysis and named entity recognition.
  • The emergence of Transformer architectures in 2017, which improved the efficiency and performance of neural networks by using attention mechanisms to focus on relevant parts of the input and output.
  • The adoption of self-supervised and semi-supervised learning methods, which allowed LLMs to learn from large amounts of unlabeled text data without requiring human annotations.
LLMs have demonstrated remarkable abilities to understand and generate human-like text, as well as to acquire general knowledge and facts from the data they are trained on. However, they also face several challenges and limitations, such as:

  • The ethical and social implications of using LLMs for various purposes, such as misinformation, bias, privacy, and accountability.
  • The scalability and efficiency issues of training and deploying LLMs, which require huge amounts of data, parameters, and computational power.
  • The generalization and robustness problems of LLMs, which may fail to handle novel or out-of-distribution inputs, or to explain their reasoning and decisions.

LLMs are one of the most impressive and influential achievements in AI technology, but they are also constantly evolving and improving. The future of LLMs may involve new architectures, algorithms, datasets, applications, and challenges that will push the boundaries of natural language understanding and generation.

Here is a list of sources and citations to sources for the summary of the history of large language models:

  • A Comprehensive Overview of Large Language Models by Humza Naveed, Asad Ullah Khan, Shi Qiu, Muhammad Saqib, Saeed Anwar, Muhammad Usman, Naveed Akhtar, Nick Barnes, Ajmal Mian. This is a review article that covers a broad range of topics related to LLMs, such as architectures, datasets, benchmarks, efficiency, and challenges. You can cite this source as follows:

Naveed, H., Khan, A. U., Qiu, S., Saqib, M., Anwar, S., Usman, M., Akhtar, N., Barnes, N., & Mian, A. (2023). A Comprehensive Overview of Large Language Models. arXiv preprint arXiv:2307.06435.

Scribble Data. (2023). Large Language Models 101: History, Evolution and Future. Retrieved from Scribble Data on October 19, 2023.

Snorkel AI. (2023). Large language models: their history, capabilities and limitations. Retrieved from Snorkel AI on October 19, 2023.

Zhang, Y., & Liang, P. (2023). Studying Large Language Model Generalization with Randomized Training Data. arXiv preprint arXiv:2308.03296. 

None of these developments could have happened without the initial development of super fast computers, huge memory storage capabilities, and the internet. Which came first, artificial intelligence or computers? The concept of AI came from the earlier concept of thinking machines. The earliest idea of a "thinking machine" came in the 1830s when British mathematician Charles Babbage envisioned what he called the analytical engine. Viewed in the context of history, AI as it exists today was inevitable. 

What does all this mean? Essentially, the current notoriety of AI is based on developments that started more than a hundred years ago. The current handwringing and predictions about the end of the world, have been going on since before Karel ÄŒapek's novel R.U.R., which introduced the word robot in 1921, and can be glimpsed in Mary Shelley's Frankenstein (published in 1818). See Wikipedia: AI Takeover.

What will happen to genealogy as soon as one genealogy company works out the details of using AI to analyze the information in their data base and family trees? You can see a glimmer of what is already happening with the suggestions now being made when you add a new ancestral line to an Ancestry.com family tree with their record hints and suggestions for parents. With the constant and accelerating development of AI programs, it is certain that how we do genealogy today will be different tomorrow.

Thursday, October 19, 2023

Challenges of the FamilySearch.org Family Tree, Now and in the Future


For whatever reasons, both the FamilySearch.org Family Tree and the entire website face some serious challenges now and in the future. These challenges can be divided into two separate but related general categories: technological changes and data related issues. For the purpose of this post, I am not including issues that arise solely in the context of the temple ordinances performed by members of The Church of Jesus Christ of Latter-day Saints. 

The first and most serious challenge is data related and can be summarized by the old computer admonition: "garbage in - garbage out." The issue is how to prevent the Family Tree from having so many unsupported, inaccurate, and duplicated entries that it becomes so unreliable and full of errors as to be unusable. This issue arises in the dichotomy between encouraging new users to enter their basic family information and the need to put some reasonable controls on both the format and content of all "new" entries. Behind this particular issue is the ever-present problem of duplication of effort which I will explain next. 

Genealogical duplication occurs at two levels; when a new individual is added when that individual is already present in the Family Tree and when research is done by those who do not use the Family Tree to determine if the information they are researching is already available and documented in the Family Tree. Let me give an example of each of these duplication issues. 

The most common cause of duplication in the Family Tree occurs when a person who is unaware of or ignoring possible duplicates adds a name to a family or adds an entire family that is already recorded in the Family Tree. In many cases the new duplicate entry lacks supporting information such as dates and places. Because of the lack of complete information, the FamilySearch search program will not identify the new entry as a duplicate. Of course, the website can look for duplicates but the system as it now exists, often fails to "see" that the newly added individual or family is a duplicate entry until some additional information about the new individual or family is added by other users' research. 

Here is one way this duplication can occur. Let's suppose I add a name such as "John Smith" from my own personal records with limited supporting information such as that he was born in "about 1800" in the "United States." There is an good possibility that the "John Smith" I am entering will be a duplicate, but there is no way for either the person entering the information or for the computer program to determine which of the thousands of John Smiths are the duplicate or duplicates. When this happens, the program can offer possible duplicates when the user submits the limited information, but because the user does not know who their person is and cannot match the name to an existing entry, the user likely chooses to create a new person. This works fine if the user goes on to do additional research, finds the duplicate or duplicates and merges the entry. However, this is not the case when the user does not know how to do the subsequent research or is ignorant or avoiding the duplicate possibility.  Unfortunately, the website is designed to accept vague entries such as the about 1800 and United States entries in my example above. 

From my own experience, this problem of initial duplication is extensive in Latin America and other areas where the Family Tree has a large number of "new" users who are adding information about their immediate ancestors but ether choosing to ignore the suggested duplicates because they don't know what to do about them or because they think that they are creating "their own" family tree. This issue can be resolved to some extent by education as I will explain below.

Duplication becomes a more serious issue when the person entering "new" information is extracting individuals or adding families from census or other records without systematically verifying family connections. A prime example of this is the early extraction program in England where baptism, marriages, and burial records were individually extracted and showed up as duplicate individuals in the Family Tree; with three or more for each person entered. These duplicates are still being found regularly by researchers. What is not surprising about these duplicates is not only are they common, but ongoing individual and institutional extraction programs are currently adding hundreds of thousands of duplicates. 

Another example of the wholesale addition of duplicates comes from allowing old and new GEDCOM data to be added directly to the Family Tree. There are some people who deny that this is happening but experienced researchers who are watching their own entries find this occurring regularly. Those who are adding the entries do not look for duplicates and assume that they can add their "own" information to the Family Tree. Currently, the process for adding entries to the Family Tree from a GEDCOM file require the user to review whether or not the entries are duplicates but some users ignore the process and mark all their entries as new and thereby flood the Family Tree with up to thousands of duplicate entries. 

I could go on practically indefinitely about the duplicate issue, but I think that I have given enough examples to illustrate the problem. This brings up one of the other major issues and one that contributes to the duplicate issue which I mentioned above. This is the issue of entry level training or education. Although the FamilySearch offers several different pathways to learning about the Family Tree, users can always choose to skip the training and start adding names directly. There are presently no requirements to learn anything all all about the website before entering information into the Family Tree. You can enter names into the Family Tree with nothing more than a name. The website will note that dates and places are missing, but still allows the name to be entered. Why does FamilySearch resist the need to train people how to use the website before making entries?

I must digress here to explain why I use adding "just a name." This occurs when I am entering information from another research source such as Ancestry.com. I need a "place holder" in FamilySearch so that I can immediately start transferring information I already have with sources in my Ancestry.com family tree. Any name I add is always connected to a family where I am doing on-going research.

Back to training. There is no lack of training available. Again, referring to my extensive experience in helping new Spanish speaking users from Latin America and around the world, I find that they cannot use the website simply because they lack some really basic information about how to use it. Once I explain the relationship between records and entries and how to find the records they are relieved to know what to do.  The lack of available and required training is the one biggest obstacle to these new users having a discovery experience. Over the years of working on and helping to develop websites, I have found that adding some "Getting Started" buttons does not work when the user is supposed to know how to properly enter names, dates, and places. Warning messages that you haven't done a certain task correctly are useless unless the website provides the information to properly enter the information. Failing to have some introductory information and notice of the standards for entering information guarantees garbage in and garbage out. 

What else? The technology that is called artificial intelligence has recently progressed to the point where the FamilySearch.org website is simply old and out of date. There is no reason now that new and experienced users could not enter valid information using a conversational interface. The website should be asking users what they want to do (and adding an option for experienced users to opt out of everything except data entry and correction.) This technology already exists. The technology to construct family trees from valid sources with more accuracy that almost all potential users also exists but there is apparently a perceived FamilySearch problem that this will end up cutting out the user in the data entry experience. It is interesting that worrying about a new user entering his or her personal family information takes precedence over the accuracy of the entire website. It is possible for anyone beginning to use the website to find that information about their dead ancestors is already in the Family Tree. I commonly find that for Spanish speaking users who are struggling to find an ancestor, someone else in their family has already entered the information they are looking for and, as illustrated above, the new user is duplicating the research. The Family Tree is supposed to be universal so why should any new user be forced to rediscover their part of the universal family tree by doing duplicate research? An AI interface could help this new user have a good experience without later finding out all the work had already been done. 

Moving on, record hints are helpful but new users, without training, do not know how or why to use them. With an AI interface, the website could simply say, "I see you have record hints, do you need help in adding sources to your part of the Family Tree?" Why not have the website itself help to keep the information accurate instead of leaving it to experienced genealogists to waste their research time correcting the bad entries of others. 

The issue of researchers who duplicate research that is already in the Family Tree occurs both through lack of knowledge about the existence of the FamilySearch.org Family Tree and specific avoidance of using the Family Tree because of its existing reputation for inaccuracy and duplication. One fear of using artificial intelligence is that people will be replaced and lose their jobs. Those of us who are now spending an inordinate amount of time maintaining and correcting the Family Tree do not need this job. We would gladly turn it over to AI unless, of course, FamilySearch wants to start paying us to maintain the Family Tree then we might also worry about losing our jobs. 

I have a lot more to say about these issues and will probably keep writing until I pass on to whatever reward I get for doing all this work in the first place.