Fair Use of Copyrighted Works: U.S. Court Rules on Protected Books Used to Train Claude

A recent U.S. federal court decision has shed important light on the complex copyright issues raised when artificial intelligence companies use vast libraries of books to train large language models (LLMs). In the seminal case of Bartz et al. v. Anthropic PBC, Judge William Alsup of the Northern District of California issued a partial summary judgment on whether the unauthorized copying of books by AI firm Anthropic qualified as “fair use” under Section 107 of the Copyright Act.

The decision reveals a key tension between innovation and intellectual property rights and invites a comparison with EU copyright law, especially Article 4 of the Copyright in the Digital Single Market Directive (CDSM Directive).

What Happened Leading Up To The Law Suit?

Anthropic, the creator of the Claude AI system, was found to have copied over seven million books from pirated and purchased sources, using them in two principal ways:

To build a permanent internal research library.
To train its LLMs, including Claude, by selecting subsets of these books.

The court assessed whether these uses qualified as “fair use” under Section 107 of the U.S. Copyright Act, analyzing four distinct uses:

Training LLMs on books (including “transformative use” of content).
Digitizing legally purchased print books.
Creating a centralized digital library from pirated copies.
Retaining unused pirated books for possible future use.

What Did The Court Find?

Generally, training LLMs with books is “ transformative” and “fair use“
The court found that training Claude using copyrighted books did not reproduce or distribute infringing output, and instead transformed the books into new statistical relationships and generative capabilities; a new use not substitutable with the original. Therefore, it ruled this use was fair under U.S. law.
Digitizing legally purchased hard copy books is also “fair use“
Anthropic destroyed the physical copies after scanning them, using the digital versions internally. The court likened this to prior rulings where format-shifting (e.g., analog to digital) was accepted when it did not lead to redistribution. No surplus copies were created, and accordingly no new market was harmed.
Storing pirated books for library purposes is not “fair use“
The court rejected the idea that simply maintaining a comprehensive research library with pirated books is protected. The judge emphasized that building a “library of all the books in the world” by piracy is inherently infringing, even if some books are later used in fair-use contexts like training.

How Would EU Law Assess This? CDSM Directive Article 4 Compared

Key Contrasts:

Aspect	U.S. “Fair Use“	EU Article 4 CDSM
Purpose	Open-ended (“transformative use”)	Open to any purpose, but rights may be reserved
Opt-out	No opt-out “fair use” applies ex post	Explicit opt-out mechanism for rightsholders
Pirated copies	Pirated sources never allowed	Pirated sources never allowed
Retention for future use	Possibly infringing if not justified	“as long as is necessary for the purposes of text and data mining” but only with lawful access
Licensing obligation	No requirement if use is fair	Permissible only where no opt-out exercised

In the EU, Anthropic’s use of pirated copies would also have clearly fallen outside of Article 4 of the CDSM Directive. The text and data mining exception requires lawful access, meaning content scraped from pirate libraries like Books3 or LibGen would not qualify. Furthermore, EU publishers can opt out by marking their content with machine-readable reservations, effectively closing off AI firms from using such data unless a license is obtained prior.

Takeaway

In the U.S., “fair use” remains flexible but is increasingly context-sensitive. “Transformative use“, especially in AI training, is a strong defense,but does not excuse initial unlawful acquisition.
In the EU, required lawful access and authors’ opt-out reservations place stronger limits on AI training datasets, even when the ultimate use is innovative.
Companies developing AI systems should:
- Audit datasets for provenance and licensing.
- Build mechanisms to comply with opt-out signals under CDSM.
- Avoid reliance on pirated content, even for internal or non-public uses.

This ruling in Bartz v. Anthropic signals a growing judicial awareness of the nuanced ways AI interacts with copyright laws. While the court supported innovation through “fair use“, it firmly drew the line at piracy, which is a position that aligns with EU law’s more formalized safeguards.

Need Help With Your AI?

Disclaimer:

The content of this blog is provided for general informational purposes only and does not constitute legal advice. While we strive to ensure that the information is accurate and up to date, it may not reflect the most current legal developments or the specific circumstances of your organization. Readers should not act upon any information contained in this blog without first seeking professional legal counsel. No attorney-client relationship is created through your use of or reliance on the information provided herein.

Conflicting Approaches to AI, Training and Copyright: Gema v OpenAI (Munich) v Getty Images v Stability AI (England)

Conflicting Approaches to AI, Training and Copyright: Gema v OpenAI (Germany) versus Getty Images v Stability AI (England)

November 25, 2025 No Comments

Generative AI Taken To Court – Part 2

November 10, 2025 No Comments

Apple, the App Store and UK Competition Law: Kent v Apple [2025] CAT 67

October 24, 2025 No Comments

Fair Use of Copyrighted Works: U.S. Court Rules on Protected Books Used to Train Claude