Some people eat, sleep and chew gum, I do genealogy and write...

Friday, November 14, 2025

Do LLMs violate copyright law?


 As an introduction, I need to note that I am not current registered as an attorney in any jurisdiction. I am a fully retired attorney with 39+ years of trial experience. I have and I presently am very involved in intellectual property legal issues with an emphasis on copyright law. I also am a partner in a graphic design business for more than forty years and was constantly aware of copyright law. I might add, I am very opinionated about copyright law. 

The development of AI and the use of LLMs (Large Language Models) has raised a fervor of lawsuits asserting violation of copyright claims against the AI companies. 

As of November 2025, the U.S. Supreme Court has not issued a decision directly resolving copyright claims specifically involving large language models (LLMs) or generative AI works.

There are currently a long list of major lawsuits against companies developing large language models alleging copyright infringement including this sample list.

  • Andersen v. Stability AI, Midjourney, DeviantArt (US, 2023–2025) Artists Sarah Andersen, Karla Ortiz, and Kelly McKernan sued these firms, alleging unauthorized use of their artwork to train generative models like Stable Diffusion. Most DMCA claims were dismissed, but copyright infringement claims are progressing.​
  • Thomson Reuters v. Ross Intelligence (US, ongoing) Thomson Reuters sued Ross Intelligence for allegedly using copyrighted legal headnotes to train an AI legal assistant. As of 2025, key questions of copyrightability and fair use are on appeal in US federal courts.
  • Multiple Cases Against OpenAI and Microsoft (US, 2024–2025) Over 30 copyright lawsuits are active against OpenAI and Microsoft, with authors, publishers, and news organizations claiming unauthorized ingestion of their works by LLMs such as GPT-4. Some lawsuits are in early discovery; twelve U.S. cases were consolidated for pre-trial proceedings in late 2024.​

LLMs are massive AI systems that use vast amounts of data and trillions of parameters to understand and generate human language. Unlike older forms of technology covered by copyright law, LLMs store their knowledge encoded as numerical values (parameters) rather than as direct copies of text. AI developers argue that during training, the original copyrighted texts are transformed beyond recognition, making direct comparison impossible. They contend the LLM functions as a "black box" that generates new text based on learned patterns, not by storing or copying the original material. However, copyright holders claim that LLMs can reproduce copyrighted content, either through intermediate copies during training or through specific user prompts. The "substantial similarity" legal test and side-by-side comparisons are increasingly relevant for evaluating these claims.

Copyright holders claim LLMs can regurgitate copyrighted content. Plaintiffs must focus on possible intermediate copies made during training or specific user-triggered outputs. The "substantial similarity" legal test and side-by-side comparisons are increasingly irrelevant for LLMs. 

Current copyright law is poorly equipped to handle LLMs and their use of training data and generated outputs. The outcome of these cases may reshape copyright standards for the digital age and necessitate new legal tests. A recently decided federal court case illustrates the fallacy of nearly all the presently pending lawsuits. The US federal courts have original jurisdiction over copyright claims. This means all copyright cases must be brought in the federal court. 

The lawsuit is (Zlozower v. The Rock and Roll Hall of Fame and Museum, Inc., No. 1:24-cv-01817-CAB (N.D. Ohio Nov. 10, 2025)).

Here is a summary of this lawsuit:

Background: Neil Zlozower, a noted music photographer, sued the Rock and Roll Hall of Fame, alleging copyright infringement after the museum used his black-and-white photograph of Van Halen in a late 1970s recording studio as part of a prominent exhibit display. Zlozower claimed the museum had not licensed the image, nor credited him, and sought statutory damages for alleged willful infringement.

Central Issue: Whether the museum’s use of Zlozower’s photograph constituted copyright infringement or was protected by the legal doctrine of fair use.

Decision: On November 10, 2025, Judge Christopher A. Boyko of the Northern District of Ohio ruled in favor of the Rock and Roll Hall of Fame. The court found that although the museum made a “nearly exact” copy of Zlozower's photo, its use was transformative—shifting the photo’s purpose from promotion of the band to illustrating the historical significance of Eddie Van Halen’s guitar in rock history. The display contributed educational context and did not simply reproduce the image for commercial or promotional gain.

Outcome: The court determined the museum’s use of the photo was legal “fair use” under copyright law, and Zlozower’s claims were dismissed. The museum’s argument that its use served a broader educational purpose rather than merely exploiting the photographer’s work was accepted by the judge.​

In summary: The Rock and Roll Hall of Fame won the lawsuit, with the display of Zlozower’s photograph deemed transformative and fair use under copyright law, resulting in dismissal of the copyright infringement claim.

Pursuing a copyright infringement case in the United States federal court is a complex and costly process. Most cases begin with the filing of a complaint and can continue through pre-trial motions, discovery, trial, and even appeals. The typical expenses incurred include attorney fees, costs for expert witnesses, document production during discovery, court filing fees, and sometimes appeals to higher courts.

On average, litigating a federal copyright case from start to finish is expensive—recent studies and legal reports estimate the total cost often exceeds $100,000, with some sources placing the median around $150,000 to $300,000 for a typical case. These thresholds reflect the cumulative costs of legal representation, expert analysis, and necessary trial resources. For complex or high-value cases, especially those involving millions of dollars in dispute, costs can run far higher, sometimes reaching or exceeding $1 million per party.

Some disputes end earlier in the process, such as through settlement before trial, which typically costs significantly less. For small claims under $30,000, the Copyright Claims Board (CCB) now offers a more affordable and streamlined resolution, reducing the cost dramatically—often to under $1,000.

In summary, federal court copyright litigation is a significant financial undertaking, most feasible for parties prepared to spend well into six figures on legal fees, unless they qualify for lower-cost alternatives like the CCB. Before suing, it is wise to assess not only the merits of a case but also the financial resources required for what can be a lengthy and expensive legal journey.

My personal opinion is that copyright violation or infringement lawsuits are almost exclusively a "big corporation" affair where one corporation, for other than copyright purposes, bashes another less financed corporation or individual. The current "copyright" issues, for the most part, involving AI an LLMs fall into the arguments presented in Zlozower. The high value settlements current in the news are not being made on legal grounds but are merely examples of the high cost of copyright litigation. 

The argument that an LLM can regurgitate and novel or other book is simply bogus. Don't believe all the news. I might be proven wrong by a decision of the current Supreme Court, but I doubt it. 

Here are a few commentaries on the present copyright law that apply to the AI issues. 

Colson, Abby. “Copyright Litigation 101.” Thomson Reuters Law Blog, December 16, 2022. https://legal.thomsonreuters.com/blog/copyright-litigation-101/.
“COPYRIGHT—N.D. Ohio: Rock and Roll Hall of Fame Is off the Hook for Copyright Infringement Claim.” Accessed November 14, 2025. https://www.vitallaw.com/news/copyright-n-d-ohio-rock-and-roll-hall-of-fame-is-off-the-hook-for-copyright-infringement-claim/ipm01960f12f0d85547d080943c1332ec70d7.
Fee, Rachel FertigOscar Orozco-BotelloJ Kevin. “What We’ve Learned from Ten Months of Copyright Claims Board Proceedings: Eight Things for Companies to Consider.” DLA Piper. Accessed November 14, 2025. https://www.dlapiper.com/en/insights/publications/intellectual-property-news/2023/what-weve-learned-from-ten-months-of-copyright-claims-board-proceedings.
Gesmer, Lee. “Copyright And The Challenge of Large Language Models (Part 1) • Mass Law Blog.” Mass Law Blog, July 1, 2024. https://www.masslawblog.com/copyright/copyright-and-the-mechanics-of-large-language-models/.
IPISC. “Understanding the Cost of Copyright Infringement Lawsuits.” IPISC, June 28, 2024. https://ipisc.com/understanding-the-cost-of-copyright-infringement-lawsuits/.
Justia Law. “Zlozower v. Rock and Roll Hall of Fame and Museum, Inc., No. 1:2024cv01817 - Document 30 (N.D. Ohio 2025).” Accessed November 14, 2025. https://law.justia.com/cases/federal/district-courts/ohio/ohndce/1:2024cv01817/310903/30/.
“Need-to-Knows of the New Copyright Claims Board for Small-Value Copyright Claims.” Accessed November 14, 2025. https://www.fr.com/insights/thought-leadership/blogs/need-to-knows-of-the-new-copyright-claims-board-for-small-value-copyright-claims/.
Scheland, Nora. “How the Copyright Claims Board Can Lighten the Federal Court Docket | Copyright.” Webpage. The Library of Congress, December 20, 2024. https://blogs.loc.gov/copyright/2024/12/how-the-copyright-claims-board-can-lighten-the-federal-court-docket.

Thursday, November 13, 2025

Do we need specialized prompts when using AI for genealogical research?


 Artificial Intelligence has generated its own jargon or special words or expressions that are used by a particular profession or group and are difficult for others to understand. Jargon is used to exclude outsiders from insiders and can evolve into almost a complete language such as the one used by medical doctors and lawyers. Here is a list generated by Google Gemini showing some of the jargon terms that are regularly used in conjunction with AI:

  • AI (Artificial Intelligence) - The simulation of human intelligence processes by machines.

  • Machine Learning (ML) - A subset of AI where systems "learn" from data and make decisions.

  • Deep Learning (DL) - A subset of ML using deep Artificial Neural Networks to analyze complex data.

  • Artificial Neural Network (ANN) - A computing system inspired by the human brain's interconnected neurons.

  • Algorithm - A set of rules a computer follows to complete a task.

  • Training Data - The dataset used to teach an AI model.

  • Bias - Errors in AI output resulting from skewed training data.

  • Alignment - Ensuring an AI model's goals and actions match human values.

  • Generative AI (GenAI) - AI that creates new content (text, images, video).

  • Large Language Model (LLM) - A deep learning model trained on massive text for understanding and generating language.

  • Prompt - The user's text input to an AI system.

  • Prompt Engineering - The skill of crafting effective prompts for desired AI output.

  • Hallucination - When an AI confidently generates false information.

  • Token - The smallest unit of text an LLM processes.

  • Temperature - A parameter controlling the randomness or creativity of an AI's response.

  • Transformer - A neural network architecture effective for language tasks, using self-attention.

  • Multimodal Model - An AI capable of processing and generating multiple data types (text, images, audio).

  • Retrieval-Augmented Generation (RAG) - A method connecting an LLM to external sources for improved accuracy.

  • Artificial General Intelligence (AGI) - Theoretical AI with human-level intelligence across all tasks.

  • Artificial Narrow Intelligence (ANI) - AI designed for a single, specific task (current AI).

  • Natural Language Processing (NLP) - AI focusing on computers understanding and generating human language.

  • Computer Vision - AI that enables computers to interpret visual information.

  • Chatbot - A software application that mimics human conversation.

One of these jargon terms, prompt, is used extensively as the term for a way to communicate with chatbots.  There is a fast-growing industry of people "developing" and selling prompt systems. Some of these specialized prompts are being characterized as "secret"  or "perfect." There is already a huge number of YouTube.com videos promoting and selling different "prompt solutions" and yes, there are a number of videos touting genealogy ai prompts. Some of the suggested prompts run into multiple pages of text single spaced. 

So, I asked Google Gemini if specialized prompts get a better or more complete response.  The answer was yes. Here is what the summary said. 
While I can certainly answer simple questions, providing me with specific instructions, context, and constraints acts like giving me a focused searchlight instead of a floodlight, resulting in a more precise, detailed, and complete output tailored exactly to your needs.
The question at the end of the full response from Gemini, was this:
Is there a specific topic or type of information you are currently working on where I could help you formulate a more specialized prompt?
Here is one of Gemini's suggestions:

Components of an Effective Specialized Prompt

ComponentDescriptionExample Phrase
Goal/TaskWhat you want me to do."Generate a summary..."
Context/TopicThe specific subject matter."...about the impact of the printing press on 16th-century literacy rates."
AudienceWho the information is for."Explain this as if you are talking to a middle school student."
Format/LengthHow the answer should be structured."Use a bulleted list, starting with a strong topic sentence."
ToneThe desired feeling or style."Adopt a persuasive and optimistic tone."
The initial key word is "generate." Using prompts is almost second nature to me because I have been "talking" to computers for years using instructions, programming, queries, and search terms.  I guess my answer to my own question in the title of this post would be a qualified yes. Would any of the videos help you get better responses from a chatbot? I would suggest you might get better feedback from the chatbot itself by simply asking it if there is a better way to ask a question. I asked Gemini and it gave me a long free answer.

es, absolutely! Here is a versatile, high-detail prompt template designed to maximize the completeness and quality of my response, especially for complex or multi-step tasks.


📝 High-Detail Prompt Template (The R-T-C-F Method)

This method breaks your request into four key areas: Role, Task, Context/Constraints, and Format.

SectionPurposeExample Instructions
1. 🎭 Role (R)Sets my persona and knowledge base."Act as a senior financial analyst specializing in emerging markets."
2. 🎯 Task (T)Defines the core action required."Your task is to evaluate the risk profile of investing in lithium mining operations in South America."
3. 🛑 Constraints (C)Imposes limitations or necessary details."Focus only on Chile and Argentina. Cite data from 2020-2024. Do not discuss regulatory risk."
4. 🛠️ Format (F)Specifies the desired output structure."Present the analysis as a comparison table with three columns: Country, Production Volume, and Key Risk Factor. Conclude with a two-sentence summary."

➡️ Template Structure for You to Copy and Fill In

Markdown
**[1. ROLE]** Act as a [Specific Role/Expertise].
**[2. TASK]** Your primary task is to [Specific Action Verb and Goal].
**[3. CONTEXT & CONSTRAINTS]**
* **Topic Focus:** [The exact subject matter.]
* **Scope Limits:** [What should be included and, crucially, what should be excluded.]
* **Depth/Timeframe:** [Level of detail required or historical period to cover.]
**[4. FORMAT]**
* **Structure:** [e.g., Use a numbered list, a table, a 5-paragraph essay.]
* **Tone/Style:** [e.g., Technical, persuasive, conversational, academic.]
* **Length:** [e.g., Limit to 400 words, provide exactly 5 steps.]

✨ Example of a Completed Specialized Prompt

[1. ROLE] Act as a university-level computer science professor.

[2. TASK] Your primary task is to explain the concept of time complexity in algorithms.

[3. CONTEXT & CONSTRAINTS]

  • Topic Focus: Big O notation, specifically $O(n^2)$ and $O(\log n)$.

  • Scope Limits: Do not discuss space complexity or linked lists. Use Python code examples.

  • Depth/Timeframe: Explain the concepts using a simple, real-world analogy before presenting the code.

    [4. FORMAT]

  • Structure: Use three distinct sections: Analogy, Code Example $O(n^2)$, and Code Example $O(\log n)$.

  • Tone/Style: Educational and clear.

  • Length: Keep the entire response under 500 words.

Do I thnk all this would improve my interaction with chatbots? Maybe, but I already get the answers I need and ask for just using my own computer-speak. Hmm, maybe I should have tried to sell all the information I got from asking Gemini?

The whole idea here is that if you have a question and you don't get the answer you expected, ask your chatbot to tell you how to ask the question. You don't really need to watch YouTube videos or pay for the information unless you are already paying for a chatbot.