Cod­ing a mur­derer

Dis­cover how al­go­rithms put se­rial killers at the cen­tre of the map – Tam­sin Ox­ford in­ves­ti­gates the Mur­der Ac­count­abil­ity Project

Linux User & Developer - - Contents - Thomas Har­grove is a jour­nal­ist and the mind be­hind MAP, the Mur­der Ac­count­abil­ity Project

How the Mur­der Ac­count­abil­ity Project is help­ing to catch killers

One jour­nal­ist, a story on pros­ti­tu­tion, a pas­sion for open source soft­ware, and an old yet pow­er­ful database led to the cre­ation of the Mur­der Ac­count­abil­ity Project (MAP, www.mur­der­data. org). The MAP in­volves find­ing se­rial killers, pre­vent­ing mur­ders and con­nect­ing the sta­tis­ti­cal dots. It’s also a project that has a fas­ci­nat­ing back story.

Thomas Har­grove – the jour­nal­ist – had pur­chased a Uni­form Crime Re­port from the Univer­sity of Mis­souri while do­ing re­search on a story about pros­ti­tu­tion in 2004. The univer­sity threw in the Sup­ple­men­tal Homi­cide Re­port at no ex­tra cost, and this free, data-heavy doc­u­ment changed the course of Thomas’s life.

“The doc­u­ment con­tained row af­ter row of in­for­ma­tion about in­di­vid­ual mur­ders that cov­ered ev­ery­thing from the month, the year the mur­der hap­pened, and the ju­ris­dic­tion,” says Thomas. “The file also con­tained data around the age, sex, method of killing, race and the po­lice the­ory around the killing, plus the of­fender’s de­tails if the in­for­ma­tion was avail­able.

“The mo­ment I saw this file I was ask­ing my­self one very im­por­tant ques­tion: would it be pos­si­ble to use this data to teach a com­puter to de­tect se­rial mur­ders?,” he con­tin­ues. “Could I use open source tools to build a plat­form that en­ables peo­ple to ac­cess this data and un­der­stand it in ways that al­low for these mur­ders to be solved more ef­fec­tively?”

To­day, MAP is a vi­able open source, non-profit or­gan­i­sa­tion with an al­go­rithm that is ‘ca­pa­ble of de­tect­ing se­rial killers who tar­get mul­ti­ple vic­tims us­ing sim­i­lar meth­ods of killing within a spe­cific ge­o­graphic re­gion’. The plat­form takes a de­press­ingly end­less list of vi­o­lent deaths and trans­forms it into a vis­ual tool that high­lights pat­terns and trends which may have been pre­vi­ously un­de­tected by the FBI or the po­lice. It also neatly helps to solve one of the big­gest chal­lenges fac­ing po­lice de­tec­tives – ‘link­age blind­ness’ – where they don’t recog­nise the link be­tween one case and an­other, or that there may be a com­mon of­fender in­volved. It’s un­der­stand­able: dif­fer­ent peo­ple work­ing dif­fer­ent cases re­ported in dif­fer­ent ways would make it hard to pick up sim­i­lar­i­ties. It’s some­thing MAP can sup­port de­tec­tives in over­com­ing, be­cause it con­nects the data dots to cre­ate vis­ual pat­terns and high­light sim­i­lar­i­ties.

The project has un­der­gone sev­eral it­er­a­tions since Thomas first held the data in his hands back in 2004. It wasn’t easy find­ing the al­go­rithm and teach­ing the code to trans­late the data in mean­ing­ful ways. To­day, MAP has evolved thanks to in­ven­tive code, de­ter­mined de­vel­op­ment and a pas­sion for solv­ing the un­solved…

once Thomas dis­cov­ered the data in 2004 he spent the next six years try­ing to per­suade his ed­i­tor to al­low him to test his the­ory. He knew that the in­for­ma­tion that the univer­sity had given him could be used to re­fine and re­solve mur­ders, he just needed the time and the bud­get. Af­ter years of push­ing the right but­tons he fi­nally got the chance to use the data while work­ing on a project known as Mur­der Mys­ter­ies in 2010.

The project won awards and raised just the right amount of aware­ness to al­low for Thomas to take his work to the next level. The Univer­sity of Mis­souri of­fered him the sup­port of a tal­ented Masters can­di­date, Liz Lu­cas, and so started the hunt for the per­fect al­go­rithm.

“We found hun­dreds of things that didn’t work,” says Har­grove. “We tested to see if an el­e­vated rate of mur­ders would in­di­cate the pres­ence of a se­rial killer, and it failed. We tested to see if there was an el­e­vated rate of fe­male mur­ders, and this didn’t de­liver re­sults. We tested to see if there was an el­e­vated rate of mur­der for a par­tic­u­lar type, such as stran­gu­la­tion, and this was also a no.”

The team tried a lot of variations, but ul­ti­mately it was a se­rial killer who taught them what would work: the green River Killer, gary Ridge­way. He had left a se­ries of bod­ies be­hind him as one of the most pro­lific killers in the United States. He stran­gled 49 women be­fore he was caught and con­fessed to the crimes.

“We knew there had been a se­rial killer in Seat­tle and we wanted to know if we could craft a com­puter pro­gram that would alert us to this pat­tern in Seat­tle at that time,” says Har­grove. “We tried all these dif­fer­ent com­bi­na­tions to see if Seat­tle would ap­pear and noth­ing worked. Then we tried the rates of un­solved mur­ders, which also didn’t re­ally im­pact on the re­sults. Fi­nally, what worked was clus­ter analysis.”

It was a blend of the vic­tim’s gen­der, lo­ca­tion and method of mur­der that fi­nally worked. orig­i­nally, the data used age, but this met­ric has been sub­se­quently dis­carded be­cause it had a min­i­mal im­pact on re­sults. The mur­ders were then al­lo­cated a mur­der group num­ber based on these four cat­e­gories and the sys­tem used this struc­ture to cre­ate around 100,000 groups.

“Then we told the com­puter to cal­cu­late how of­ten, in each of these groups, the mur­ders were solved, and to alert us to any large groups of mur­der clus­ters that had low rates of res­o­lu­tion,” Har­grove says. “It worked

The plat­form takes a de­press­ingly end­less list of vi­o­lent deaths and trans­forms it into a vis­ual tool that high­lights pat­terns and trends

like a dream. We had fi­nally hit upon the al­go­rithm. We then ap­plied it to the area where the green River Killer had been ac­tive and there it was – a huge bub­ble over Seat­tle. An­other se­rial killer came right af­ter Ridge­way and the al­go­rithm de­tected him eas­ily.”

The al­go­rithm was now ca­pa­ble of find­ing dozens of clus­ters all over the coun­try, a plethora of deadly bub­bles that in­di­cated an ex­cess of mur­der, which may or may not in­di­cate that a se­rial killer is at work. Some are known, some are not.

The tech be­hind the al­go­rithm

“We have used a va­ri­ety of dif­fer­ent so­lu­tions to cre­ate MAP,” ex­plains Har­grove. “We have pro­pri­etary soft­ware called Tableau ( that en­ables us to vis­ually dis­play the data on the web­site, but as we’re

committed to en­sur­ing all the data and tools are open source and ac­ces­si­ble, we also make the raw data avail­able to any­one to use on any plat­form they wish.”

Vis­i­tors keen to turn their eye to un­lock­ing pat­terns and po­ten­tially sav­ing lives can use a free copy of Tableau, down­load the work­book created by MAP and ma­nip­u­late the data. It can’t be saved onto a lo­cal hard drive or server, but it does en­able any­one to work with the tech­nol­ogy and the data us­ing the same sys­tem as MAP. The team of the non-profit or­gan­i­sa­tion, now ex­tended to in­clude a vice chair­man, trea­surer and board of di­rec­tors, uses Tableau be­cause it pro­vides them with the tech they need to dis­play large sets of data on the web­site.

“We have en­sured that the base data and ev­ery­thing that we have built onto the plat­form is not pro­pri­etary,” says Har­grove. “We are ded­i­cated to mak­ing homi­cide data more read­ily avail­able to the world, which will hope­fully en­cour­age the pub­lic to re­view mur­der oc­cur­rence data and eas­ily ac­cess the in­for­ma­tion. our tar­get au­di­ence is, of course, homi­cide de­tec­tives and we wanted a sys­tem that would al­low them to call up records on their own and share them with other de­part­ments.”

open source

MAP is de­signed to be as open a plat­form as pos­si­ble, to en­sure it can be ac­cessed across ju­ris­dic­tions, US coun­ties and vary­ing tech­ni­cal abil­ity. The team isn’t in­ter­ested in se­crets or mys­tery or ob­fus­ca­tion – they use the data that the FBI has openly pro­vided them and use it to en­hance and ex­pand their ca­pa­bil­i­ties and the ef­fi­ciency of the al­go­rithm.

“The re­port pro­vided by the FBI ac­counts for all the homi­cides in each ju­ris­dic­tion, along with case-level de­tail that in­cludes around 30 vari­ables for each in­di­vid­ual,” says Har­grove. “We have also com­bined this with the data gath­ered by re­gional po­lice de­part­ments. Many of these don’t col­lab­o­rate with the FBI, so we have added their in­for­ma­tion to the data that the FBI has given us to cre­ate an in­cred­i­bly de­tailed database. We cur­rently have case­level de­tails on 752,000 mur­ders from 1976 to the present. You won’t find that level of in­sight any­where else.”

This data is also freely ac­ces­si­ble to the pub­lic and can be down­loaded as UCR (Uni­form Crime Re­port­ing) re­ports. These re­ports are as­sem­bled us­ing the IBM pro­gram SPSS Statis­tics, which is ca­pa­ble of man­ag­ing so­phis­ti­cated op­er­a­tions and delv­ing into the database to im­pres­sive depths. Be­cause this is a pro­pri­etary sys­tem, MAP has opted to use PSPP (­ware/ pspp) as the open source al­ter­na­tive. This gnU project en­ables ac­cess across mul­ti­ple plat­forms that un­der­pins the MAP phi­los­o­phy: every­one should be able to play a part in de­tect­ing se­rial mur­der­ers.

“We ad­vise peo­ple to rather use PSPP be­cause it can be down­loaded at no ex­tra cost – many peo­ple have done just that as they want to work with the datasets we have created,” says Har­grove. “Mak­ing the en­tire project open

source is in­cred­i­bly im­por­tant to us as there’s still so much more work to be done. The mur­der res­o­lu­tion rate has been de­clin­ing and we be­lieve that by en­cour­ag­ing peo­ple to come to our site and down­load the work­books, raw data and data dic­tio­nar­ies they can call us with ques­tions and sug­ges­tions. This is the en­tire prin­ci­ple that drives the open source com­mu­nity and can help us save lives.”

Vis­i­tors keen to turn their eye to un­lock­ing pat­terns can use a free copy of Tableau

The data col­lected by MAP from the FBI is de­cid­edly user-un­friendly. Writ­ten in Cobol, it comes with some odd­i­ties and com­plex­i­ties that im­pact on its ac­ces­si­bil­ity, but this was the lan­guage that was com­mon when the

FBI started as­sem­bling the data.

To re­solve the chal­lenges of shift­ing the data from com­plex to ac­ces­si­ble, MAP has writ­ten so­phis­ti­cated syn­tax com­mands that take these wads of data and as­sem­bles them into the PSPP files. This can be fur­ther split out into CSV files which, Har­grove ad­mits, may not be as use­ful as PSPP but are a lin­gua franca that ev­ery sta­tis­ti­cal pack­age can open.

“These syn­tax com­mands are also avail­able to any­one who wants them,” says Har­grove. “They have made a la­bo­ri­ous process far eas­ier. They also en­able us to ex­pand the searches more widely – from only get­ting in­sights from a spe­cific ju­ris­dic­tion to in­sights across bor­ders, be­cause crim­i­nals don’t re­spect geopo­lit­i­cal bound­aries. This en­tire process is made very sim­ple with the PSPP plat­form.”

The records are as­sem­bled and put into mul­ti­ple for­mats so they can be down­loaded on de­mand. This is the data that is then loaded into Tableau to show­case all the in­for­ma­tion in a vis­ual for­mat – age, gen­der, method, lo­ca­tion, type of mur­der and more – and al­lows for quick cross-tab­u­la­tion. For de­tec­tives who work with lim­ited time and plenty of in­for­ma­tion, Tableau helps them to test a the­ory or de­ter­mine if their type of crime has oc­curred in dif­fer­ent ju­ris­dic­tions.

“over the past year we have added fur­ther re­fine­ment to the sys­tem, where we now vis­ually show mur­der clus­ters iden­ti­fied by the al­go­rithm we’ve de­vel­oped,” adds Har­grove. “There’s a kind of magic to care­fully count­ing things and un­cov­er­ing an­swers. our al­go­rithm counts through the three-quar­ters of a mil­lion records look­ing for clus­ters that have an el­e­vated prob­a­bil­ity of be­ing se­rial mur­ders and puts them on this vis­ual map that re­veals se­rial mur­der po­ten­tial.”

So the sys­tem works – and now the team is over­whelmed with cases. Cases that need to be solved be­cause, as Har­grove con­cludes, “Bud­gets, lim­ited re­sources, in­creased death tolls – all these fac­tors are in­flu­enc­ing the abil­ity of the au­thor­i­ties to solve crimes. What makes this more of a con­cern is that if mur­ders go un­solved it in­spires even more mur­ders. When­ever a killer gets away with a crime, he be­comes a liv­ing tes­ta­ment to a sys­tem that isn’t work­ing. Mur­der begets mur­der, and clear­ance re­duces mur­der.”


SPSS Statis­tics­ket­place/spss-statis­tics Sup­ple­men­tal Homi­cide Re­port­brs/ad­den­dum-for-sub­mit­ting­cargo-theft-data/shr

Univer­sity of Mis­souri https://mis­

Uni­form Crime Re­ports

Above Big graph, big­ger prob­lems: mur­der is onthe rise is the US

Thomas un­der­scor­ing the value ofmur­der ac­count­abil­ity at the In­ves­tiga­tive Re­porters and Ed­i­tors (IRE) an­nual con­fer­ence

Newspapers in English

Newspapers from UK

© PressReader. All rights reserved.