The prob­lem with ev­i­dence-based poli­cies

Financial Mirror (Cyprus) - - FRONT PAGE -

Many or­gan­i­sa­tions, from govern­ment agen­cies to phil­an­thropic in­sti­tu­tions and aid or­gan­i­sa­tions now re­quire that pro­grammes and poli­cies be “ev­i­dence-based.” It makes sense to de­mand that poli­cies be based on ev­i­dence and that such ev­i­dence be as good as pos­si­ble, within rea­son­able time and bud­getary lim­its.

But the way this ap­proach is be­ing im­ple­mented may be do­ing a lot of harm, im­pair­ing our abil­ity to learn and im­prove on what we do.

The cur­rent so-called “gold stan­dard” of what con­sti­tutes good ev­i­dence is the ran­domised con­trol trial, or RCT, an idea that started in medicine two cen­turies ago, moved to agri­cul­ture, and be­came the rage in eco­nom­ics dur­ing the past two decades. Its pop­u­lar­ity is based on the fact that it ad­dresses key prob­lems in sta­tis­ti­cal in­fer­ence.

For ex­am­ple, rich peo­ple wear fancy clothes. Would dis­tribut­ing fancy clothes to poor peo­ple make them rich? This is a case where cor­re­la­tion (be­tween clothes and wealth) does not im­ply cau­sa­tion.

Har­vard grad­u­ates get great jobs. Is Har­vard good at teach­ing – or just at se­lect­ing smart peo­ple who would have done well in life any­way? This is the prob­lem of se­lec­tion bias.

RCTs ad­dress th­ese prob­lems by ran­domly as­sign­ing those par­tic­i­pat­ing in the trial to re­ceive ei­ther a “treat­ment” or a “placebo” (thereby cre­at­ing a “con­trol” group). By ob­serv­ing how the two groups dif­fer af­ter the in­ter­ven­tion, the ef­fec­tive­ness of the treat­ment can be as­sessed. RCTs have been con­ducted on drugs, mi­cro-loans, train­ing pro­grams, ed­u­ca­tional tools, and myr­iad other in­ter­ven­tions.

Sup­pose you are con­sid­er­ing the in­tro­duc­tion of tablets as a way to im­prove class­room learn­ing. An RCT would re­quire that you choose some 300 schools to par­tic­i­pate, 150 of which would be ran­domly as­signed to the con­trol group that re­ceives no tablets. Prior to dis­tribut­ing the tablets, you would per­form a so-called base­line sur­vey to as­sess how much chil­dren are learn­ing in school. Then you give the tablets to the 150 “treat­ment” schools and wait. Af­ter a pe­riod of time, you would carry out an­other sur­vey to find out whether there is now a dif­fer­ence in learn­ing be­tween the schools that re­ceived tablets and those that did not.

Sup­pose there are no sig­nif­i­cant dif­fer­ences, as has been the case with four RCTs that found that dis­tribut­ing books also had no ef­fect. It would be wrong to as­sume that you learned that tablets (or books) do not im­prove learn­ing. What you have shown is that that par­tic­u­lar tablet, with that par­tic­u­lar soft­ware, used in that par­tic­u­lar ped­a­gog­i­cal strat­egy, and teach­ing those par­tic­u­lar con­cepts did not make a dif­fer­ence.

But the real ques­tion we wanted to an­swer was how tablets should be used to max­imise learn­ing. Here the de­sign space is truly huge, and RCTs do not per­mit test­ing of more than two or three de­signs at a time – and test them at a snail’s pace. Can we do bet­ter?

Con­sider the fol­low­ing thought ex­per­i­ment: We in­clude some mech­a­nism in the tablet to in­form the teacher in real time about how well his or her pupils are ab­sorb­ing the ma­te­rial be­ing taught. We free all teach­ers to ex­per­i­ment with dif­fer­ent soft­ware, dif­fer­ent strate­gies, and dif­fer­ent ways of us­ing the new tool. The rapid feed­back loop will make teach­ers ad­just their strate­gies to max­imise per­for­mance.

Over time, we will ob­serve some teach­ers who have stum­bled onto highly ef­fec­tive strate­gies. We then share what they have done with other teach­ers.

No­tice how rad­i­cally dif­fer­ent this method is. In­stead of test­ing the va­lid­ity of one de­sign by hav­ing 150 out of 300 schools im­ple­ment the iden­ti­cal pro­gramme, this method is “crawl­ing” the de­sign space by hav­ing each teacher search for re­sults. In­stead of hav­ing a base­line sur­vey and then a fi­nal sur­vey, it is con­stantly pro­vid­ing feed­back about per­for­mance. In­stead of hav­ing an econo­me­tri­cian do the learn­ing in a cen­tralised man­ner and in­form ev­ery­body about the re­sults of the ex­per­i­ment, it is the teach­ers who are do­ing the learn­ing in a de­cen­tralised man­ner and in­form­ing the cen­tre of what they found.

Clearly, teach­ers will be con­fus­ing cor­re­la­tion with cau­sa­tion when ad­just­ing their strate­gies; but th­ese er­rors will be re­vealed soon enough as their wrong as­sump­tions do not yield bet­ter re­sults. Like­wise, se­lec­tion bias may oc­cur (some places may be do­ing bet­ter than oth­ers be­cause they dif­fer in other ways); but if dif­fer­ent con­texts re­quire dif­fer­ent strate­gies, the sys­tem will find them sooner or later. This strat­egy re­sem­bles more the so­cial im­ple­men­ta­tion of a ma­chine-learn­ing al­go­rithm than a clin­i­cal trial.

In eco­nom­ics, RCTs have been all the rage, es­pe­cially in the field of in­ter­na­tional de­vel­op­ment, de­spite cri­tiques by the No­bel lau­re­ate An­gus Deaton, Lant Pritch­ett, and Dani Ro­drik, who have at­tacked the in­flated claims of RCT’s pro­po­nents. One se­ri­ous short­com­ing is ex­ter­nal va­lid­ity. Lessons travel poorly: If an RCT finds out that giv­ing mi­cronu­tri­ents to chil­dren in Gu­atemala im­proves their learn­ing, should you give mi­cronu­tri­ents to Nor­we­gian chil­dren?

My main prob­lem with RCTs is that they make us think about in­ter­ven­tions, poli­cies, and or­gan­i­sa­tions in the wrong way. As op­posed to the two or three de­signs that get tested slowly by RCTs (like putting tablets or flipcharts in schools), most so­cial in­ter­ven­tions have mil­lions of de­sign pos­si­bil­i­ties and out­comes de­pend on com­plex com­bi­na­tions be­tween them. This leads to what the com­plex­ity sci­en­tist Stu­art Kauff­man calls a “rugged fit­ness land­scape.”

Get­ting the right com­bi­na­tion of pa­ram­e­ters is crit­i­cal. This re­quires that or­gan­i­sa­tions im­ple­ment evo­lu­tion­ary strate­gies that are based on try­ing things out and learn­ing quickly about per­for­mance through rapid feed­back loops, as sug­gested by Matt An­drews, Lant Pritch­ett and Michael Wool­cock at Har­vard’s Cen­ter for In­ter­na­tional De­vel­op­ment.

RCTs may be ap­pro­pri­ate for clin­i­cal drug tri­als. But for a re­mark­ably broad ar­ray of pol­icy ar­eas, the RCT move­ment has had an im­pact equiv­a­lent to putting au­di­tors in charge of the R&D depart­ment. That is the wrong way to de­sign things that work. Only by cre­at­ing or­gan­i­sa­tions that learn how to learn, as so-called lean man­u­fac­tur­ing has done for in­dus­try, can we ac­cel­er­ate progress.

Newspapers in English

Newspapers from Cyprus

© PressReader. All rights reserved.