Fair Use of Copyrighted Works: U.S. Court Rules on Protected Books Used to Train Claude

A recent U.S. federal court decision has shed important light on the complex copyright issues raised when artificial intelligence companies use vast libraries of books to train large language models (LLMs). In the seminal case of  Bartz et al. v. Anthropic PBC, Judge William Alsup of the Northern District of California issued a partial summary judgment on whether the unauthorized copying of books by AI firm Anthropic qualified as “fair use” under Section 107 of the Copyright Act.

The decision reveals a key tension between innovation and intellectual property rights and invites a comparison with EU copyright law, especially Article 4 of the Copyright in the Digital Single Market Directive (CDSM Directive).

What Happened Leading Up To The Law Suit?

Anthropic, the creator of the Claude AI system, was found to have copied over seven million books from pirated and purchased sources, using them in two principal ways:

  1. To build a permanent internal research library.
  2. To train its LLMs, including Claude, by selecting subsets of these books.

The court assessed whether these uses qualified as “fair use” under Section 107 of the U.S. Copyright Act, analyzing four distinct uses:

  • Training LLMs on books (including “transformative use” of content).
  • Digitizing legally purchased print books.
  • Creating a centralized digital library from pirated copies.
  • Retaining unused pirated books for possible future use.

What Did The Court Find?

  1. Generally, training LLMs with books is “ transformative” and “fair use
    The court found that training Claude using copyrighted books did not reproduce or distribute infringing output, and instead transformed the books into new statistical relationships and generative capabilities; a new use not substitutable with the original. Therefore, it ruled this use was fair under U.S. law.
  2. Digitizing legally purchased hard copy books is also “fair use
    Anthropic destroyed the physical copies after scanning them, using the digital versions internally. The court likened this to prior rulings where format-shifting (e.g., analog to digital) was accepted when it did not lead to redistribution. No surplus copies were created, and accordingly no new market was harmed.
  3. Storing pirated books for library purposes is notfair use
    The court rejected the idea that simply maintaining a comprehensive research library with pirated books is protected. The judge emphasized that building a “library of all the books in the world” by piracy is inherently infringing, even if some books are later used in fair-use contexts like training.

How Would EU Law Assess This? CDSM Directive Article 4 Compared

Key Contrasts:

AspectU.S. “Fair UseEU Article 4 CDSM
PurposeOpen-ended (“transformative use”)Open to any purpose, but rights may be reserved
Opt-outNo opt-out “fair use” applies ex postExplicit opt-out mechanism for rightsholders
Pirated copiesPirated sources never allowedPirated sources never allowed
Retention for future usePossibly infringing if not justifiedas long as is necessary for the
purposes of text and data mining
” but only with lawful access
Licensing obligationNo requirement if use is fairPermissible only where no opt-out exercised

In the EU, Anthropic’s use of pirated copies would also have clearly fallen outside of Article 4 of the CDSM Directive. The text and data mining exception requires lawful access, meaning content scraped from pirate libraries like Books3 or LibGen would not qualify. Furthermore, EU publishers can opt out by marking their content with machine-readable reservations, effectively closing off AI firms from using such data unless a license is obtained prior.

Takeaway

  • In the U.S., “fair use” remains flexible but is increasingly context-sensitive. “Transformative use“, especially in AI training, is a strong defense,but does not excuse initial unlawful acquisition.
  • In the EU, required lawful access and authors’ opt-out reservations place stronger limits on AI training datasets, even when the ultimate use is innovative.
  • Companies developing AI systems should:
    • Audit datasets for provenance and licensing.
    • Build mechanisms to comply with opt-out signals under CDSM.
    • Avoid reliance on pirated content, even for internal or non-public uses.

This ruling in Bartz v. Anthropic signals a growing judicial awareness of the nuanced ways AI interacts with copyright laws. While the court supported innovation through “fair use“, it firmly drew the line at piracy, which is a position that aligns with EU law’s more formalized safeguards.

Need Help With Your AI?

Contact us today

Disclaimer:
The content of this blog is provided for general informational purposes only and does not constitute legal advice. While we strive to ensure that the information is accurate and up to date, it may not reflect the most current legal developments or the specific circumstances of your organization. Readers should not act upon any information contained in this blog without first seeking professional legal counsel. No attorney-client relationship is created through your use of or reliance on the information provided herein.