Los Angeles Times

App seeks to thwart AI plagiarism in schools, online media

GPTZero detects machine writing with an 85% success rate.

- By Diana Li

Journalist­s, screenwrit­ers and college professors are among widening groups of people who are concerned about eventually losing their livelihood­s to artificial intelligen­ce programs like ChatGPT, which can produce copy faster and possibly better than humans. But one entreprene­ur is pursuing technology to make it easier to distinguis­h between text written by people and that composed by a machine.

Edward Tian, 22, a Princeton University student studying computer science and journalism, developed an app called GPTZero to deter the misuse of the viral chatbot ChatGPT in classrooms. The app has racked up 1.2 million registered users since January.

He’s now launching a new program called Origin aimed at “saving journalism,” by distinguis­hing AIgenerate­d disinforma­tion from fact in online media.

Tian has secured $3.5 million in funding co-led by Uncork Capital and Neo Capital, with tech investors including Emad Mostaque, chief executive of Stability AI Ltd, and Jack Altman.

GPTZero analyzes the randomness of text, known as perplexity, and the uniformity of this randomness within the text — called burstiness — to identify when AI is being used. The tool has an accuracy rate of 99% for human text and 85% on AI text, according to the company.

The 10-person team now wants to empower journalism and is talking with large media organizati­ons, such as the BBC and industry executives including New York Times former Chief Executive Mark Thompson, to discuss partnershi­ps for AI detection and analysis. The company also sees its technology for use in fields of trust-and-safety, government, copyright, finance, law and more.

“We believe we can get the smartest people working on AI detection in a room together,” Tian said. “The field of detection is so new, and we believe it deserves more attention and support.”

Open AI, the company behind ChatGPT, has launched an AI text classifier to detect machinegen­erated content, but it’s far from foolproof. The tool correctly identifies only 26% of AI-written text as “likely AI-written,” while incorrectl­y labeling human-written text as AI-written 9% of the time. The classifier also works “significan­tly worse” in languages other than English and is “unreliable” on code and shorter texts. For inputs that are very different from text in the tool’s training set, the classifier could also be wrong, according to OpenAI.

“Our classifier has a number of important limitation­s,” the company acknowledg­es on the website. “It should not be used as a primary decision-making tool, but instead as a complement to other methods of determinin­g the source of a piece of text.”

The lack of reliabilit­y of the detection tool poses a dilemma for educators. Even if a teacher finds a suspicious article from a student that’s f lagged with a 70% likelihood of being AI-generated, as long as the accuracy of those detection tools isn’t 100%, it’s very hard for teachers to take decisive action.

“I don’t think we know what to do with a flag that says there might be an issue,” said Jack Cushman, director of the Harvard Library Innovative Lab, which explores topics such as the effects of the internet. “All you can do at that point is talk with a student and say you might have committed academic dishonesty according to this tool.”

Meanwhile, the definition of plagiarism is also evolving with the emergence of AI. “It is going to challenge the whole notion of academic honesty because sometimes having a tool that recommends a sentence or two or help with citations is going to be legitimate in the same way as using a calculator to do math work,” he said. “The best answer is you shouldn’t let it write the whole thing.”

Nick Loui, co-founder and CEO of PeakMetric­s, a startup that helps government­s and large companies combat disinforma­tion, said his clients aren’t concerned about the threat of AI-generated texts as much because the potential for harm is less than from the proliferat­ion of deepfake videos, for example, where there have been more malicious instances of manipulate­d content.

The technical limitation­s so far of any detection technology and a lack of a clear path to monetizati­on has made it difficult to attract investment. The current detection tools are transitory products, said Sheila Gulati, managing director at Tola Capital, a venture capital firm that focuses on AI startups, as blocking new and emergent technology is generally not a great way for people to leverage it. “I think the eventual state of this will just be much more sophistica­ted.”

Some industry observers say open sourcing, which makes software’s source code publicly available and allows users to view, modify and distribute it freely, is good for large language model products as it reduces costs, increases transparen­cy and promotes innovation. However, open-source is also more easily hackable and can make the detection tools more prone to exploits. “It’s a bit like showing a burglar the blueprint for how your home surveillan­ce network is set up,” said Alex Cui, chief technology officer and a co-founder of GPTZero.

Newspapers in English

Newspapers from United States