OpenAI Sued by Authors Alleging ChatGPT Trained on Their Writing

It's the latest legal challenge to training AI and machine learning on content.

David Lumb Mobile Reporter
David Lumb is a mobile reporter covering how on-the-go gadgets like phones, tablets and smartwatches change our lives. Over the last decade, he's reviewed phones for TechRadar as well as covered tech, gaming, and culture for Engadget, Popular Mechanics, NBC Asian America, Increment, Fast Company and others. As a true Californian, he lives for coffee, beaches and burritos.
Expertise Smartphones | Smartwatches | Tablets | Telecom industry | Mobile semiconductors | Mobile gaming
David Lumb
2 min read
Open AI logo on a phone
Photo Illustration by Nikolas Kokovlis/NurPhoto via Getty Images

Two authors have sued ChatGPT creator OpenAI for allegedly using their works of fiction to train the machine learning underpinning the chatbot's artificial intelligence, as Reuters reported. 

The copyright lawsuit was filed on behalf of science fiction and horror author Paul Tremblay and novelist Mona Awad in San Francisco federal court on Wednesday. Since ChatGPT can give summaries of their works, it stands to reason that those works were fed into the machine learning models used by ChatGPT. 

The suit, which seeks class action status, accuses OpenAI of training ChatGPT on works "without consent, without credit and without compensation" to the authors, according to a copy of the filing uploaded by Reuters.

The filing alleges that their works likely came from a pair of online book datasets referenced in OpenAI's 2020 paper published to introduce GPT-3, the large language model that powers the ChatGPT chatbot. The authors of the lawsuit claim that these datasets likely source their material from "shadow library" websites like Library Genesis and Sci-Hub, which use torrent downloads to illegally publish copyrighted works, according to Bloomberg Law.

"These flagrantly illegal shadow libraries have long been of interest to the AI-training community," the filing alleges.

OpenAI didn't immediately respond to a request for comment.

Other AI lawsuits and struggles

Soon after AI tools emerged last year, lawsuits began challenging what the tools were trained on and how they could be used. 

Photo service Getty Images blocked AI-generated images back in September, and then in February, it sued AI art generator Stable Diffusion for allegedly copying over 12 million images from its database without permission or compensation. 

Separately, three artists sued Stable Diffusion, art generator Midjourney and art hosting site DeviantArt in January for allegedly using their work to train AI models without consent or compensation, claiming that "millions of artists" have been similarly victimized, according to The Verge.

In response, software maker Adobe released Firefly in March, a generative AI toolset that uses the company's own library of stock images to create images without fear of illegally scraping artists' works. Adobe is gearing up to integrate Firefly into the other products in its software lineup, like Photoshop.

Creators have hit other speed bumps while integrating AI into the modern publishing process. The US copyright office denied copyright protections to the AI-generated art in a graphic novel, though it did grant them for the human-created writing. And short story publications have been swamped with AI-generated submissions, to the point where the celebrated outlet Clarkesworld banned anything even partially created with AI.