ChatGPT Detectors Are Biased and Easy to Fool, Research Shows

Stanford researchers find GPT detection software routinely misclassifies writing from non-native English speakers and can be duped by "literary language."

Jackson Ryan Former Science Editor
Jackson Ryan was CNET's science editor, and a multiple award-winning one at that. Earlier, he'd been a scientist, but he realized he wasn't very happy sitting at a lab bench all day. Science writing, he realized, was the best job in the world -- it let him tell stories about space, the planet, climate change and the people working at the frontiers of human knowledge. He also owns a lot of ugly Christmas sweaters.
Jackson Ryan
5 min read

Software to detect AI generated text from ChatGPT has been shown to be biased against non-native English writers.

Olivier Morin/Getty

The text you're reading right now was typed into a Google Doc by a human being. But that may not be the case with text you encounter elsewhere. With the rise of generative AI programs the public can access for free, like ChatGPT for text and Midjourney for images, it's becoming harder to spot text created by humans from that generated by an AI. 

Artificial intelligence -- automated computer systems, algorithms and machine learning -- has long been used in social media, scientific research, advertising, agriculture and industry, mostly unnoticed. But the rise of OpenAI's ChatGPT has ignited an arms race in places like the classroom, where students have turned to the program to cheat, authoring entire human-sounding essays. Teachers have deployed detection software hoping to catch plagiarists in the act. 

In a new study, published in the journal Patterns on Monday, researchers from Stanford University examined how reliable these generative AI detectors are at determining whether text was written by a human or an AI. The research team was surprised to find that some of the most popular GPT detectors, which are built to spot text generated by apps like ChatGPT, routinely misclassified writing by non-native English speakers as AI generated, highlighting limitations and biases users need to be aware of. 

The team took 91 TOEFL (Test of English as a Foreign Language) essays from a Chinese forum and 88 essays written by US eighth-graders. They ran these through seven off-the-shelf GPT detectors, including OpenAI's detector and GPTZero, and found only 5.1% of the US student essays were classified as "AI generated." On the other hand, the human-written TOEFL essays were misclassified 61% of the time. One specific detector flagged 97.8% of the TOEFL essays as AI generated. 

All seven detectors flagged 18 of the 91 TOEFL essays as AI generated. When the researchers drilled deeper on these 18 essays, they noted a lower "text perplexity" was likely the reason. Perplexity is kind of a proxy measure for variability or randomness in a given text. Non-native English writers have previously been shown to have a less rich vocabulary and use less rich grammar. This, to the GPT detectors, makes it seem like it was written by an AI.

Basically, if you're using verbose and literary text, you're less likely to be classified as an AI. But this shows a worrying bias and raises concerns non-native English speakers could be adversely affected in, for instance, job hiring or school exams, where their text is flagged as generated by AI. 

ChatGPT and "literary language"

The researchers ran a second experiment essentially flipping their first on its head. This time, they used AI to see if detection software correctly identified it as AI generated.

The team used ChatGPT to generate responses to the 2022-2023 US college admission essay prompts. They ran the ChatGPT-generated essays through their seven detectors and found that, on average, the detectors spotted AI-generated essays 70% of the time. But they went back to ChatGPT with another prompt to augment the essays: "Elevate the provided text by employing literary language."

This prompt generated essays that bamboozled the GPT detectors -- they were able to correctly classify text as AI-generated only 3.3% of the time. Similar results were seen when the team had ChatGPT write scientific abstracts.

"We didn't expect these commercial detectors to do so poorly on text from non-native speakers or to be so easy to fool by GPT," said James Zou, a biomedical data scientist at Stanford University and co-author of the new study.

Because they're easy to fool, this may see non-native English speakers begin to use ChatGPT more often, prompting the service to make their work sound like it was written by a native English speaker.

Ultimately, the two experiments raise a pivotal question, according to the researchers: If it's so easy to fool the detectors and human text is frequently misclassified, then what good are the detectors at all? 

My own GPT detection experiment

I ran my own experiment after reading the paper, using the same freely available GPT detection software used in the Stanford study.

I wrote a completely nonsensical sentence: "The elephant parkour cat flew on his pizza bicycle to a planet that only existed in the brain of a purple taxi driver. 'Now that's a sour meatball!' he said. The sun, delightful though it tastes, is battery-powered and contains a startling toxin: Wolf teeth."

A major GPT detector suggested there was "a moderate likelihood of being written by AI." I then assessed five of the freely available detectors available online and used by the Stanford team. Two determined it was AI written, two said human written and one said I didn't use enough words to reach the threshold.

I then used ChatGPT to write a summary of nuclear scientist J. Robert Oppenheimer's life with the prompt, "Please write a character summary of Oppenheimer's life." I put the summary through detection software, but it wouldn't be fooled, determining it was written by AI. Good.

Then I went back to ChatGPT and used the same prompt the researchers used in the paper: "Elevate the provided text by employing literary language." This time, the summary of Oppenheimer's life fooled the detector, which said it was likely written entirely by a human. It also fooled three of the other five detectors. 

How to get to a better place

Whether it's misclassifying human text as AI generated or simply being fooled, the detectors clearly have a problem. Zou mentions that a promising mechanism to strengthen the detectors could be to compare multiple writings on the same topic, including both human and AI responses in the set, and then see if they can be clustered. This might enable a more robust and equitable approach.

And the detectors may be helpful in ways we're yet to see. The researchers mention that if a GPT detector is built to highlight overused phrases and structures, it might actually lead to more creativity and originality in writing. 

However, to date, the generation and detection arms race has been a little bit Wild Westworld, with improvements in AI followed by improvements in the detectors, with little oversight in development. The team advocates for further research and emphasizes that all of the parties affected by generative AI models like ChatGPT should be involved in the conversations about their acceptable use. 

Until such a time, the team "strongly caution against the use of GPT detectors in evaluative or educational settings, particularly when assessing the work of non-native English speakers."

Editors' note: CNET is using an AI engine to help create some stories. For more, see this post.