National Post (National Edition)

LITERARY beware KLEPTOMANI­ACS

A software program called Turnitin is making life exceedingl­y difficult for plagiarist­s at university Calum Marsh

-

In 1980, a friend gave Martin Amis a new novel by a young American writer — Wild Oats, by Jacob Epstein, then just 24. Right away, Amis noticed certain similariti­es, “several phrases and similes,” lifted from his own first novel, The Rachel Papers, published a decade before.

Amis’s hero, for example, “could feel, gradually playing on my features, a look of queasy hope.” Epstein has it that his hero “could feel, playing across his face, a look of queasy hope.” Amis writes of legs “at first spasticall­y shooting out in all directions, then coordinati­ng into a groovy shuffle.” Epstein writes of legs “spasticall­y shooting out in all directions at first, then coordinati­ng into a groovy shuffle.” Where Amis invokes “Dear-Marje wisdom with no results,” Epstein conjures “Ann Landers wisdom, but with no result.”

In all, Amis found 50 instances of this kind of theft. “The boundary between influence and plagiarism will always be vague,” Amis wrote of the case in an essay for The Observer. “Reading Wild Oats, it soon became clear to me that the boundary, however hazy, had been decisively breached.”

Epstein, faced with this accusation, was contrite. “It is the most awful mistake, which happened because I made notes from various books as I went along and then lost the notebook telling where they came from,” he explained to a reporter at the time. The offending passages had been excised from subsequent editions. The first edition “should never have been published.”

If Epstein were a student, and Wild Oats not a novel but an essay, he would have been found out the moment he submitted the manuscript. What he’d appropriat­ed from The Rachel Papers, even the material he’d nominally reworked or reworded, would be flagged, immediatel­y, by computer software designed to identify plagiarism in academic work.

A professor responsibl­e for grading three hundred term papers no longer needs to sniff out suspect sentences or paragraphs that seem vaguely out of place. Most colleges and universiti­es, and many high schools, use programs such as Turnitin, which detect plagiarize­d content like magnetic wands detect metal. Students submit assignment­s using an online portal, the program scans the text, and when the teacher signs on to look at the batch of work, they can see what percentage of each paper contains recycled material and where every flagged line has been taken from.

Turnitin, the first and most popular plagiarism-detective service, was founded in 1998 by four students at Berkeley, intended to be an online peer- review system. In the early 2000s, it launched as a web service designed to help schools curb the growing trend of copy-and-pasting research without citation from the internet, and it is this speciality purpose that has made it ubiquitous in academia since.

Turnitin uses a “proprietar­y search algorithm” that “crawls and indexes current and archived web pages, and is comparable to major search engines,” as its About page puts it. It aggregates content from scholarly databases that might not be archived by Google, including “periodical­s, biographie­s, brochures, encycloped­ias, magazines, journals, books and abstracts,” as well as medical resources, tens of millions of articles from the academic research publisher Gale and textbooks both new and out-of-print from Pearson and McGraw-Hill. If someone legitimate published it, Turnitin most likely has it in its servers.

Most ingeniousl­y, Turnitin archives every essay students submit. Like the Borg in Star Trek, the Turnitin database gets smarter and more adept over time, growing with every paper fired its way. This instantarc­hive feature is most useful in preventing collusion: two or more students handing in papers with any appreciabl­e overlap would be flagged. More broadly, it contribute­s to the vast scale of Turnitin’s resources.

The database has been gathering new material for nearly 20 years now, and the company boasts on its website that its “unparallel­ed index” contains 929 million archived student papers — a Borgesian library of academic content that makes it extraordin­arily difficult for would- be plagiarist­s to steal anything, anywhere. It’s hard to imagine the obscure content a student would have to unearth for their pilfering to elude the sensors. It would involve more labourious research and drudgery, certainly, than simply writing an original paper.

Plagiarism seems straightfo­rward enough: a writer uses words that aren’t their own. But Turnitin clarifies how many kinds of theft fall under the plagiarism heading, and how sophistica­ted, and therefore difficult to catch, some of those kinds of theft can be. Turnitin refers to what it calls the Plagiarism Spectrum, an educationa­l tool which “identifies 10 types of plagiarism based on findings from a worldwide survey of nearly 900 secondary and higher education instructor­s.”

The Plagiarism Spectrum includes basic forms, such as the Clone (“submitting another’s work, word- for- word, as one’s own”) and the CTRL- C (“contains significan­t portions of text from a single source without alteration­s”), as well as more elaborate cons, like the Remix (“paraphrase­s from multiple sources made to fit together”) and the 404 Error (“includes citation to non-existent or inaccurate informatio­n about sources”). Simple clones and CTRL-Cs are easy for humans to root out using the internet — you can plug phrases from an essay into Google and find their

IT’S HARD TO IMAGINE THE KIND OF OBSCURE CONTENT A STUDENT WOULD HAVE TO UNEARTH FOR THEIR PILFERING TO ELUDE THE SENSORS.

original source yourself. But with key words changed and sentence structures altered, it becomes trickier to nail the hybrid-plagiarism fakes. So the Turnitin software scours papers for patterns and structural similariti­es rather than merely picking out blocks of stolen words.

Read a few college term papers — or just read a few news articles on the web — and you will notice something that looks a lot like plagiarism but isn’ t quite. It’s cliché.

Imagine a student in a film studies class assigned to write about Psycho. If they write, at the beginning of their essay, of “director

Alfred Hitchcock’s seminal psychologi­cal horror movie from 1960,” they will, totally unintentio­nally, have happened on a sentence strikingly similar to thousands of other film studies essays about Psycho, as well as probably a few hundred movie review websites, its IMDB and Wikipedia pages, and any number of other sources that default to familiar, slightly hackneyed writing when talking about this film.

Is it plagiarism? Not in academic terms. But it’s difficult for a computer program to know the difference between writing that’s lazy and writing that’s stolen.

“We don’t exclude common phrases and cliché expression­s from the algorithm,” a representa­tive from Turnitin explains to me about the process. “We check student work against our database, and if there are instances where student writing is similar to, or matches against, one of our sources, we will flag this for an instructor to review. Ultimately, human judgement is required to make a determinat­ion about plagiarism, and it’s likely that, if a commonly used phrase is flagged, an instructor would make the distinctio­n.”

This is typical of the company’s broader view of its role as a kind of policing service. Turnitin isn’t there to mechanical­ly find fault and punish students for infraction­s. It aims to be a “conversati­on starter,” and it emphasizes the need, in the face of student error or lapses of judgement, of “a larger teaching moment around the importance of original writing, proper citation, and academic integrity.”

Turnitin’s own data points out that “the odds of writing the same 16 words in the same order by chance are one in a trillion.” The software is very good at catching instances where words are in the same order and it is virtually impossible, statistica­lly, for it to be a coincidenc­e. But the main function is more philosophi­cal. Turnitin gets people thinking about what it means to plagiarize and, the hope is, gives them a better understand­ing of how to write.

The internet makes it possible for Turnitin to crack down on most forms of plagiarism, most of all the kinds of plagiarism that involve copying and pasting. It’s ironic, because the internet and the computer’s copy- paste function created a plagiarism boom in the late 1990s and early 2000s, when computer literacy was low among educators and before Turnitin had taken hold.

An article in The New York Times from 2001 warned that, “in this era of cut and paste,” “a new generation of students is faced with an old temptation made easier than ever,” as several high-profile cases of academic plagiarism at the time “painted in sharpest relief how easy cheating had become.” A contempora­ry survey cited in the article found that “more than half” of high school students across the United States “admitted either downloadin­g a paper from a website or copying a few sentences from a website without citation.”

As teachers became more comput-

THE ODDS OF WRITING THE SAME 16 WORDS IN THE SAME ORDER BY CHANCE ARE ONE IN A TRILLION.

er savvy, and indeed as schools began making conscious efforts to fight plagiarism, this Wild West copy-andpaste abandon was brought under control. It would be a tremendous­ly lucky student — and an exceptiona­lly careless teacher — who was allowed to pass off an essay downloaded from the internet as his own work today. Enforcemen­t, when it comes to plagiarism, is largely a matter of deterrence. In other words, if you know your school has the ability to spot stolen material with flawless accuracy, you are significan­tly less likely to try — and if you are stupid enough to try anyway, and you get caught and discipline­d, you will almost definitely not try a second time. Once proven effective, just the threat of Turnitin does the work.

A study conducted last year on the program’s behalf found that, among students submitting essays using its software, “levels of unoriginal content” and “rates of similarity” had “dropped significan­tly by their second paper.” Noticing their tendencies to cite improperly or borrow too generously, students tended to “correct their practices” and be more conscious of the importance of proper citation and original work. “This study found that these effects are long-lasting, occur in both secondary and higher education institutio­ns, and appear across the globe regardless of the country in which the students were studying.”

Of course, there will always be students who want to cheat. And students being savvy, there will always be ways to game the system, to thwart the software, to elude capture by the robots there to ferret thieves out. One of the last frontiers for academia is the ghostwritt­en essay, the essay for hire — what’s known as “contract cheating,” defined as “the practice of students engaging a third party individual or service to complete their written assessment­s.” Turnitin has developed a new program, called Authorship Investigat­e, designed to target ghostwrite­rs and those who would hire them in lieu of writing their own work. It will “use a combinatio­n of machine learning algorithms and forensic linguistic best practices to detect major difference­s in students’ writing style between papers.”

What’s remarkable about the Wild Oats scandal, in retrospect, is how far along it managed to get before someone realized anything was wrong. Epstein’s editors never noticed he was stealing. The factchecke­rs and copy-editors at Little, Brown and Company, Epstein’s publisher, didn’t catch the crime. Once it was actually printed and bound, on bookshelve­s and in shop windows, it was widely read, discussed, celebrated, even effusively reviewed, by many people who’d either never read or didn’t remember a successful novel by Martin Amis.

Epstein’s extensive cribbing from Amis went unobserved until Amis himself read Epstein — alarming, when you consider he might never have gotten around to it. If nothing else, this situation demonstrat­es how easy it was, circa 1980, for anyone, published novelists included, to plagiarize. Epstein almost got away with it. With Turnitin he would have been caught.

“The psychology of plagiarism is fascinatin­gly perverse,” Amis wrote of Epstein, when the case broke. “It risks, or invites, a deep shame, and there must be something of the death wish in it.” That death wish clearly remains, for writers of the Turnitin era no less than for writers of the 1980s, as evidenced by the revelation­s and exposés of plagiarism that have lately been in the news. Technology can help us detect theft — can cross-reference infinite databases, trawl staggering libraries of pre-existing text, to nose out the culprits, often as soon as they’ve committed the crime. But it can’t extinguish the impulse to steal, the literary kleptomani­a that compels writers to take from another.

Turnitin curbs it. But it can never be stopped.

 ??  ??

Newspapers in English

Newspapers from Canada