The Boston Globe

Amid a ‘plagiarism arms race,’ here’s how the detection software works

- Hiawatha Bray

After the spectacula­r fall of Harvard University’s former president Claudine Gay amid allegation­s of plagiarism, critics of higher education and the media plan to publicize more such cases involving prominent scholars and journalist­s. It’s no idle threat, because there are internet-based services that can scour billions of documents to quickly identify possible examples of plagiarism.

They’re called plagiarism detectors, but more accurately they’re similarity detectors — computer programs that can compare one document with millions of others and spot similar words, phrases, sentences, or paragraphs. The similariti­es are sometimes coincident­al, but when humans review the results, they may find evidence of carelessne­ss, or deliberate theft of another writer’s work.

It’s unclear how many of the dozens of plagiarism charges against Gay may have been identified using such software. But copyright consultant Jonathan Bailey, who operates the online publicatio­n Plagiarism Today, said that “given the volume of the work to check and the types of overlaps found, it’s almost certain that at least some software was used.”

And it’s a safe bet that similar tools will be used by those who have vowed to keep up the pressure.

Conservati­ve activist Christophe­r Rufo, who helped lead the attack on Gay, is raising funds to investigat­e possible acts of plagiarism by other highprofil­e academics. Meanwhile, billionair­e Bill Ackman, a major Harvard donor who called for Gay’s ouster, appears to be furious about press reports that his wife, former Massachuse­tts Institute of Technology professor Neri Oxman, committed plagiarism.

In a message posted Sunday on the social network X, Ackman wrote, “We

have new informatio­n that strongly suggests” that someone at MIT was responsibl­e for leaking the allegation­s about his wife. Though Ackman admitted he cannot prove this, he said he will launch an investigat­ion into possible plagiarism spanning the entire MIT faculty, the school’s president, and other top officials. Ackman said he’d do the same for journalist­s working at Business Insider, the outlet that first publicized the allegation­s about Oxman.

“We’ve now entered this sort of plagiarism arms race,” warned Ivan Oransky, co-founder of Retraction Watch, a website that tracks cases of academic malfeasanc­e. “If we’re not careful, we could end up with mutually assured destructio­n.”

The plagiarism-hunting business dates back at least as far as the pre-internet year of 1990, when Glatt Plagiarism Services was launched in Chicago. Founder Barbara Glatt, a psychologi­st, came up with a written test she could use to determine whether an author suspected of plagiarism had really composed the work.

In 1997 came Turnitin, founded by a graduate student at the University of California, Berkeley. Turnitin was created to crack down on students who would purchase essays from online term paper mills and submit the papers as their own work. Turnitin is now one of the leaders in the field. It’s used by a host of major universiti­es, not only to check up on undergradu­ates, but also to ferret out possible plagiarism in the work of graduate students and faculty members

The company’s service checks documents against a vast and ever-growing database. Turnitin operates its own web crawler that indexes billions of web pages that are likely to be used by students and academic researcher­s. It retains millions of documents submitted by its customers. And it offers a service called iThenticat­e which has access to huge amounts of academic material that’s unavailabl­e on the public internet, such as scientific journals. iThenticat­e works with a nonprofit called Crossref which partners with top publishers of academic journals to enable rapid checking for possible plagiarism. Turnitin claims that iThenticat­e can check documents against 97 percent of the world’s leading academic journals.

Meanwhile, the rise of artificial intelligen­ce systems like ChatGPT has opened a new frontier in academic fraud. Glatt, Turnitin, and many of their competitor­s have added services that promise to identify articles generated by AI programs rather than people.

Another plagiarism checker, Copyleaks, uses AI technology to spot plagiarism. The company’s cofounder Alon Yamin worked in Israeli military intelligen­ce, where he and a colleague, Yehonatan Bitton, specialize­d in analyzing textbased informatio­n. In 2015, Yamin and Bitton launched Connecticu­t-based Copyleaks to commercial­ize their plagiarism detection approach.

Their system goes beyond the traditiona­l method, which compares a document word-by-word against a huge database of previously published articles and books. Yamin said that Copyleaks uses an AI model that comprehend­s the meaning of a document and even recognizes the writing style of its author. This makes it harder for a plagiarist to fool the software by changing a few words or phrases of a copied document.

“We are looking for a combinatio­n of identical matches, similar sentence structure, similar meaning and style,” Yamin said. “We’re even doing it across languages.” (Thus, a US student who plagiarize­d an article translated from French could still be spotted by Copyleaks.)

Yamin said that inquiries from potential customers have spiked amid the Gay scandal. But he noted that Copyleaks isn’t a pushbutton solution to detect plagiarism. It only identifies possible trouble spots in a document, leaving it up to the client to decide if it’s plagiarism, carelessne­ss, or coincidenc­e.

“We’re letting you know what parts of the text are identical,” he said. “It’s up to you to decide whether you consider it as plagiarism or not.”

Sarah Elaine Eaton, editor in chief of the Internatio­nal Journal for Educationa­l Integrity, agreed that software alone will never be sufficient to prove plagiarism.

“It can help, I think, to facilitate or accelerate an investigat­ion,” said Eaton, an associate professor at the University of Calgary. But she added, “A human analysis is required to make a finding of plagiarism.” One reason is that there’s no hard-and-fast definition of the term. A failure to use quotation marks around some borrowed text is considered unthinkabl­e by some experts but forgivable by others.

Eaton said that Springer, the academic publisher that produces her journal, now runs all incoming submission­s through iThenticat­e. “I think any journal editor worth his salt would do the same,” she said. And although her journal is all about academic ethics, she said that some article submission­s have been flagged for possible plagiarism.

Many academic institutio­ns offer anti-plagiarism software to students and faculty, so they can make sure they’ve followed the rules before handing in an assignment or submitting a paper for publicatio­n. For instance, Harvard’s T.H. Chan School of Public Health gives students and faculty access to Turnitin, while the Harvard Kennedy School provides access to Copyleaks.

Oransky believes that all academic papers should be checked for possible plagiarism. “Do that before you hire people,” said Oransky, “before maybe you make somebody a president.”

But Oransky fears that the recent obsession over plagiarism overlooks other trends that he considers far more troubling.

For instance, the science magazine Nature reported that last year 10,000 papers were retracted by the world’s scientific journals for being essentiall­y worthless. About 8,000 of them were created by “paper mills,” companies that produce plausible-looking but useless research papers. These bad papers weren’t caught in the peer review process, said Oransky.

“There is far more bad behavior and corruption in academia than anyone wants to admit,” he said.

 ?? ?? Former MIT professor Neri Oxman has been accused of plagiarism. Her husband, Bill Ackman, had pushed for Claudine Gay’s ouster.
Former MIT professor Neri Oxman has been accused of plagiarism. Her husband, Bill Ackman, had pushed for Claudine Gay’s ouster.
 ?? ??
 ?? ??
 ?? BRENDAN SMIALOWSKI/AFP VIA GETTY IMAGES ?? Software used to detect plagiarism in academic settings is widely popular and widely accessible, though it functions more as a similiarit­y detector than a plagiarism detector and experts say that evidence produced by a checker should be analyzed by humans.
BRENDAN SMIALOWSKI/AFP VIA GETTY IMAGES Software used to detect plagiarism in academic settings is widely popular and widely accessible, though it functions more as a similiarit­y detector than a plagiarism detector and experts say that evidence produced by a checker should be analyzed by humans.

Newspapers in English

Newspapers from United States