How can a teacher identify Ai-written prose?

2023-02-05 - Dave Taylor

Q >> Hi Dave, I’m a high school teacher and am curious about all of the AI writing tools now available. Is there a way to identify when text is written by a program rather than a person? How accurate is it today?

A>> The beginning of 2023 has been all about Openai and its CHATGPT tool. And no wonder; you can ask it to produce any type of prose and within a few seconds it will create something that’s surprisingly decent.

Unfortunately, there are a small handful of students who will see this as an opportunity to “let the AI” do the work so they don’t have to produce discussion posts, essays, or reports. Copy and paste versus the critical thinking required to produce something thoughtful and on-topic?

To be fair, this isn’t any different from plagiarism. Copying content from a book, research paper or magazine article without credit is egregious, but the student is still pulling it all together. Unless, that is, they utilize a website that offers “only A papers” on thousands of topics, from Shakespeare to organic chemistry. This unoriginal material is identified by testing multiword phrases against enormous datasets of known content. Plagiarism tests from companies like Turnitin tend to identify it with considerable accuracy.

AI tools like CHATGPT produce unique content every single time they’re invoked, so how can they be detected?

Detecting AI writing with perplexity

The current analytic measure is “perplexity,” a metric that assesses the uncommonness of word sequences. Word sequences that are highly unusual earn a higher perplexity score and therefore a greater likelihood that it was produced programmatically. Why? Because we humans tend to use the same phrases when we write.

For example, the phrase “cup and saucer” is low perplexity, but “cup and ocelot” is high perplexity as it never appears in writing.

There are a number of tools you can utilize for these tests, from GPTZERO to GPT Radar. Openai itself has also just released a tool to assess text that its own CHATGPT tool has produced.

Openai warns “in our evaluations on a challenge set of English texts, our classifier correctly identifies 26% of Ai-written text (true positives) as “likely Ai-written,” while incorrectly labeling human-written text as Ai-written 9% of the time (false positives).” It’s a tough problem and it’s best not to rely on these tools. Instead, use them to open up a discussion with students whose content is flagged as suspicious. Don’t forget that students with large vocabularies are also going to have higher “perplexity” scores, too.

It’s critical to understand the limitations of these tools and realize that even as they seek to be more accurate, the AI language models will become more sophisticated, creating a technological cat-and-mouse game.

The real conclusion, however, is that teachers are going to have to change their approach to teachings otha tin-person, non technologically assisted recitation and writing assignments become a part of student evaluation and assessment at any grade level.

How can a teacher identify Ai-written prose?

Detecting AI writing with perplexity

Newspapers in English

Newspapers from United States