Lan­guage gives fake news away

The Prince George Citizen - - Opinion -

Have you ever read some­thing on­line and shared it among your net­works, only to find out it was false? As a soft­ware en­gi­neer and com­pu­ta­tional lin­guist who spends most of her work and even leisure hours in front of a com­puter screen, I am con­cerned about what I read on­line. In the age of so­cial me­dia, many of us con­sume un­re­li­able news sources. We’re ex­posed to a wild flow of in­for­ma­tion in our so­cial net­works – es­pe­cially if we spend a lot of time scan­ning our friends’ ran­dom posts on Twit­ter and Face­book.

My col­leagues and I at the Dis­course Pro­cess­ing Lab at Simon Fraser Univer­sity have con­ducted re­search on the lin­guis­tic char­ac­ter­is­tics of fake news.

A study in the United King­dom found that about two-thirds of the adults sur­veyed reg­u­larly read news on Face­book, and that half of those had the ex­pe­ri­ence of ini­tially believ­ing a fake news story. An­other study, con­ducted by re­searchers at the Mas­sachusetts In­sti­tute of Tech­nol­ogy, fo­cused on the cog­ni­tive as­pects of exposure to fake news and found that, on aver­age, news read­ers be­lieve a false news head­line at least 20 per cent of the time.

False sto­ries are now spread­ing 10 times faster than real news and the prob­lem of fake news se­ri­ously threatens our so­ci­ety.

For ex­am­ple, dur­ing the 2016 election in the United States, an astounding num­ber of U.S. cit­i­zens be­lieved and shared a patently false con­spir­acy claim­ing that Hilary Clinton was con­nected to a hu­man traf­fick­ing ring run out of a pizza restau­rant. The owner of the restau­rant re­ceived death threats and one be­liever showed up in the restau­rant with a gun. This – and a num­ber of other fake news sto­ries dis­trib­uted dur­ing the election sea­son – had an un­de­ni­able im­pact on peo­ple’s votes.

It’s of­ten dif­fi­cult to find the ori­gin of a story af­ter par­ti­san groups, so­cial me­dia bots and friends of friends have shared it thou­sands of times. Fact-check­ing web­sites such as Snopes and Buz­zfeed can only ad­dress a small por­tion of the most pop­u­lar ru­mours. The tech­nol­ogy be­hind the in­ter­net and so­cial me­dia has en­abled this spread of mis­in­for­ma­tion; maybe it’s time to ask what this tech­nol­ogy has to of­fer in ad­dress­ing the prob­lem.

Re­cent ad­vances in ma­chine learn­ing have made it pos­si­ble for com­put­ers to in­stan­ta­neously com­plete tasks that would have taken hu­mans much longer. For ex­am­ple, there are com­puter pro­grams that help police iden­tify crim­i­nal faces in a matter of sec­onds. This kind of artificial in­tel­li­gence trains al­go­rithms to clas­sify, de­tect and make de­ci­sions.

When ma­chine learn­ing is applied to nat­u­ral lan­guage pro­cess­ing, it is pos­si­ble to build text clas­si­fi­ca­tion sys­tems that rec­og­nize one type of text from an­other.

Dur­ing the past few years, nat­u­ral lan­guage pro­cess­ing sci­en­tists have be­come more ac­tive in build­ing al­go­rithms to de­tect mis­in­for­ma­tion; this helps us to un­der­stand the char­ac­ter­is­tics of fake news and de­velop tech­nol­ogy to help read­ers.

One ap­proach finds rel­e­vant sources of in­for­ma­tion, as­signs each source a cred­i­bil­ity score and then in­te­grates them to con­firm or de­bunk a given claim. This ap­proach is heav­ily de­pen­dent on track­ing down the orig­i­nal source of news and scor­ing its cred­i­bil­ity based on a va­ri­ety of fac­tors.

A se­cond ap­proach ex­am­ines the writ­ing style of a news ar­ti­cle rather than its ori­gin. The lin­guis­tic char­ac­ter­is­tics of a writ­ten piece can tell us a lot about the au­thors and their mo­tives. For ex­am­ple, spe­cific words and phrases tend to oc­cur more fre­quently in a de­cep­tive text com­pared to one writ­ten hon­estly.

Our re­search iden­ti­fies lin­guis­tic char­ac­ter­is­tics to de­tect fake news using ma­chine learn­ing and nat­u­ral lan­guage pro­cess­ing tech­nol­ogy. Our anal­y­sis of a large col­lec­tion of fact-checked news ar­ti­cles on a va­ri­ety of top­ics shows that, on aver­age, fake news ar­ti­cles use more ex­pres­sions that are com­mon in hate speech, as well as words re­lated to sex, death and anx­i­ety. Gen­uine news, on the other hand, con­tains a larger pro­por­tion of words re­lated to work (busi­ness) and money (econ­omy).

This sug­gests that a stylis­tic ap­proach combined with ma­chine learn­ing might be use­ful in de­tect­ing sus­pi­cious news.

Our fake news de­tec­tor is built based on lin­guis­tic char­ac­ter­is­tics ex­tracted from a large body of news ar­ti­cles. It takes a piece of text and shows how sim­i­lar it is to the fake news and real news items that it has seen be­fore.

The main challenge, how­ever, is to build a sys­tem that can han­dle the vast va­ri­ety of news top­ics and the quick change of head­lines on­line, because com­puter al­go­rithms learn from sam­ples and if these sam­ples are not suf­fi­ciently rep­re­sen­ta­tive of on­line news, the model’s pre­dic­tions would not be re­li­able.

One op­tion is to have hu­man ex­perts col­lect and la­bel a large quan­tity of fake and real news ar­ti­cles. This data en­ables a ma­chine-learn­ing al­go­rithm to find com­mon fea­tures that keep oc­cur­ring in each col­lec­tion regardless of other va­ri­eties. Ul­ti­mately, the al­go­rithm will be able to dis­tin­guish with con­fi­dence be­tween pre­vi­ously un­seen real or fake news ar­ti­cles.

— Fate­meh Torabi Asr is a post­doc­toral re­search fel­low at Simon Fraser Univer­sity. This ar­ti­cle first ap­peared in The Con­ver­sa­tion.

Newspapers in English

Newspapers from Canada

© PressReader. All rights reserved.