Liability of AI Platforms for Copyright Infringement: What Every Business Should Know Before Using Generative AI

A recent federal court decision from a New York district court provides important guidance on AI platforms’ (and your) liability for using third party copyrighted content generated by these platforms. This decision should inform every business’ use of AI, particularly generative AI, especially in creating marketing materials and other content.

This post summarizes the decision and concludes with important takeaways and action items for every business using or planning to use AI to create content.

The New York District Court Decision

In The New York Times v. Microsoft Corporation, a New York federal district court found that Open Source (which is owned by Microsoft) could be liable for “contributory” infringement as a result of third-party generation of “outputs” from Open Source that allegedly infringed the rights of The New York Times (the “Times”) copyrighted content. OpenAI is a “large language model” (“LLM”) which, as the court noted in its decision, “can receive text prompts as inputs by users and generate natural language responses as outputs, which result from the LLM’s prediction of the most likely string of text to follow the inputted string of text based on its training on billions of written works.”

The Daily News, The Center for Investigative Reporting, and the Times (“Plaintiffs”) argued that when Open Source users input prompts into the platform they generate text that is substantially similar to (and therefore infringes) the Plaintiffs’ copyrighted material. That would make those users potentially liable for “direct” infringement. Plaintiffs also claimed that Open Source could be held liable for “contributory” infringement because it allegedly “materially contributed to and directly assisted with the direct infringement by [its] end users” by building its AI model and training it by using copyrighted content owned by the Plaintiffs; deciding what content was output by the Open Source through specific training techniques; and developing AI models capable of distributing the copyrighted content to end users without the permission of any of the Plaintiffs who owned copyright.

The defendants, who comprise Microsoft and multiple OpenAI entities, claimed that:

there was no direct infringement by users (a predicate to contributory infringement); and
defendants did not contributorily infringe because they did not know of third-party infringement (by OpenAI users).

Acknowledging a split among the circuit courts, the court said actual knowledge was not necessary to find OpenAI contributorily liable for its users’ copyright infringement. Instead, it determined that in the Second Circuit, where it sits, the standard is whether defendant investigated or would have had reason to investigate the infringement. Then it found that defendants might be found to have knowledge based on “widely publicized” instances of copyright infringement after other LLMs were released including ChatGPT, Browse with Bing, and Bing Chat. Additionally, Plaintiffs provided multiple examples of infringing outputs in their Complaint. The Court therefore found that it could later be determined during the fact-finding portion of the case that additional instances of third-party infringement would be disclosed.

Accordingly, the Court concluded that there was third-party infringement. The Court next found defendants could be found to have had “constructive, if not actual, knowledge” of this end-user infringement. In addition to the widely publicized infringements, the Court looked to statements made by OpenAI representatives about internal company disagreements regarding copyright issues. The Times also informed defendants that “their [defendants’] tools infringed its copyrighted works.” Accordingly, defendants “at a minimum had reason to investigate and uncover end-user infringement.” Finally, the Court found that the defendants’ LLMs could be found to have facilitated the third-party infringement. And the fact that the LLMs were capable of substantial non-infringing uses did not relieve defendants from liability.

Important Takeaways and Action Items

By contrast to contributory infringement, where actual or constructive knowledge is necessary, a user who generates infringing AI outputs need not be aware of the copyright status of third-party content or even that an AI output has copied copyrighted content. Because of the risk posed to businesses of inadvertently committing copyright infringement by generating outputs from AI in the course of advertising or promoting their goods and services, we recommend engaging counsel to take, at a minimum, the following actions:

draft an AI Policy applicable to all employees, and incorporate it into employee manuals;
advise how to minimize legal risk to employers and employees when using AI to generate content;
draft contracts commissioning the creation of AI tools for proprietary use, which, if due diligence is conducted, can minimize exposure to the business for copyright infringement;
draft agreements with marketing and advertising agencies including the use of AI tools by the agency;
counsel as to how to conduct due diligence in selecting an AI tool to minimize legal risk, including close attention to the terms of use for the tool;
counsel as to how to protect work generated with AI; and
for companies that are located in or operate in the EU, engage US counsel experienced in guiding EU agents as to the special considerations that apply with respect to use of and laws around AI there.

Lawsuits such as The New York Times v. Microsoft Corporation bring to the forefront the need to thoughtfully deploy AI. Before producing any materials with generative AI, it is important now more than ever to consult an intellectual property attorney fluent in the legal implications of AI to ensure that your content is not inadvertently infringing.

¹The Court did not reach the question of whether the training of the AI platform was an unlawful reproduction under the copyright law.

Leave a Reply Cancel reply