News
Practice Areas
Copyright Office weighs generative AI training against fair use
In May 2025, the U.S. Copyright Office issued the latest in a series of reports examining copyright and artificial intelligence (AI), this time considering the so-called training required for generative AI. As the report notes, the training draws on “massive troves of data,” including copyrighted works. The report focuses heavily on whether such use of copyrighted material falls under the fair use doctrine — a question currently at issue in dozens of lawsuits.
Note: Shortly after the report’s release, the Trump administration terminated the director of the Copyright Office. As of this writing, it’s unclear whether the replacement director might distance the Office from, or reject, this report.
Intelligence or infringement?
The report provides an overview of how generative AI models are developed and deployed. This includes the acquisition of training data, which is often done without authorization from the materials’ authors.
The Copyright Act gives copyright owners certain exclusive rights, including the right to reproduce, distribute, publicly perform and publicly display their works, along with the right to prepare derivative works. The question is whether use of these works to train generative AI violates any of these rights. The report finds that several steps in the AI development and deployment processes may, absent a license or other defense, infringe one or more of these rights.
Specifically, data collection, training and retrieval-augmented generation (the process of retrieving content from outside of a model’s training data when responding to a specific request) might implicate a copyright owner’s reproduction right. And generative AI outputs may infringe the right to prepare derivative works, in addition to reproduction, public display and public performance rights.
Is it fair use?
The primary defense available to claims of infringement in AI training is fair use. To determine whether a use is fair, courts evaluate four factors. The report notes that two factors in particular are likely to have considerable weight in a court’s analysis:
- The purpose and character of the use. When assessing the purpose and character of the use, courts emphasize the transformativeness and commerciality of the use. A high degree of transformativeness generally weighs in favor of fair use, as does a low degree of commerciality.
- The effect of the use on the potential market for, or value of, the copyrighted work. As to the effect on the potential market for the original work, the report notes that the U.S. Supreme Court has twice described this factor as the single most important factor of fair use. Courts generally consider actual or potential market substitution, market dilution, lost licensing opportunities and, on occasion, public benefits from the use.
The report thoroughly analyzes these factors in the context of generative AI training. It refers to previous court cases and some of the thousands of public comments the Copyright Office has received in response to a series of questions published in August 2023 about copyright and AI.
The Copyright Office recognizes that some uses of copyrighted works in AI training will be more transformative than others. It also acknowledges that the impact on the markets for copyrighted works could be of “unprecedented scale” given the volume, speed and sophistication with which AI systems can generate outputs, as well as the vast number of works that might be used in training.
The Office expects that some uses of copyrighted works for generative AI will qualify as fair use, while others won’t. Uses that are for purposes of noncommercial research or analysis that don’t permit portions of the works to be reproduced in the outputs will likely be deemed fair. But what about the copying of expressive works from “pirate sources” (such as shadow libraries with large collections of full, published books) to generate unrestricted content that competes in the marketplace — despite licensing being readily available? These are unlikely to qualify as fair use. But many uses, the report says, will fall somewhere in between.
Now what?
Despite its findings regarding fair use, the report doesn’t advocate for new laws. Rather, it endorses the continued development, without government intervention, of the voluntary licensing market.
Sidebar: District court enters the discussion
A recent case tackled the issue of the Copyright Act’s fair use doctrine and artificial intelligence (AI) training. In Bartz v. Anthropic PBC, an AI firm downloaded for free millions of copyrighted books in digital form from pirate sites on the internet. It also bought copyrighted books (some overlapping with those acquired from the pirate sites), tore off the bindings, scanned the pages and stored them in searchable digital files.
From this central library, the AI firm selected various sets and subsets of digitized books to train various large language models (LLMs). Some of these books’ authors sued for copyright infringement.
The trial court found that the use of the books at issue to train the LLMs was transformative and, therefore, fair. Additionally, the digitization of the books purchased in print form by the defendant was also fair use, but for a different reason than that which applies to the training copies. Instead, it was a fair use because all the AI firm did was replace the print copies it had bought for its central library with more convenient space-saving and searchable digital copies for its central library — without adding new copies, creating new works or redistributing existing copies.
However, the court found that the AI firm had no entitlement to use pirated copies for its central library. Creating a permanent, general-purpose library wasn’t itself a fair use excusing the firm’s piracy. Expect an appeal.
© 2025