A dan­ger­ous sta­tis­ti­cal quirk • Lax bank se­cu­rity in de­vel­op­ing na­tions

Any­one with an in­ter­est in how re­search forms pub­lic pol­icy should pay at­ten­tion to p-val­ues

Bloomberg Businessweek (Asia) - - CON­TENTS -

De­ci­sions af­fect­ing mil­lions of peo­ple should be made us­ing the best pos­si­ble in­for­ma­tion. That’s why re­searchers, pub­lic of­fi­cials, and any­one with views on so­cial pol­icy should pay at­ten­tion to a con­tro­versy in sta­tis­tics. The les­son: Watch out if you see a claim of the form “x is sig­nif­i­cantly re­lated to y.”

At is­sue is a sta­tis­ti­cal test that re­searchers in a wide range of dis­ci­plines, from medicine to eco­nom­ics, use to draw con­clu­sions from data. Let’s say you have a pill that’s sup­posed to make peo­ple rich. You give it to 30 peo­ple, and they wind up 1 per­cent richer than a sim­i­lar group that took a placebo.

Be­fore you can at­tribute this dif­fer­ence to your magic pill, you need to test your re­sults with a nar­row and dan­ger­ously sub­tle ques­tion: How likely would you be to get this re­sult if your pill had no ef­fect what­so­ever? If this prob­a­bil­ity, or so-called p-value, is less than a stated thresh­old—of­ten set at 5 per­cent—the re­sult is deemed “sta­tis­ti­cally sig­nif­i­cant.”

The prob­lem is, peo­ple tend to place great weight on this dec­la­ra­tion of sta­tis­ti­cal sig­nif­i­cance with­out un­der­stand­ing what it re­ally means. A low p-value doesn’t, for ex­am­ple, mean that the pill al­most cer­tainly works. Any such con­clu­sion would need more in­for­ma­tion—in­clud­ing, for a start, some rea­son to think the pill could make you richer.

In ad­di­tion, sta­tis­ti­cal sig­nif­i­cance isn’t pol­icy sig­nif­i­cance. The size of the es­ti­mated ef­fect mat­ters. It might be so small as to lack prac­ti­cal value, even though it’s sta­tis­ti­cally sig­nif­i­cant. The con­verse is also true: An es­ti­mated ef­fect might be so strong as to de­mand at­ten­tion, even though it fails the p-value test.

Th­ese reser­va­tions ap­ply even to sta­tis­ti­cal in­ves­ti­ga­tion done right. Un­for­tu­nately, it very of­ten isn’t. Re­searchers com­monly en­gage in “p-hack­ing,” tweak­ing data in ways that gen­er­ate low p-val­ues but ac­tu­ally un­der­mine the test. Ab­surd re­sults can be made to pass the p-value test, and im­por­tant find­ings can fail. De­spite all this, a good p-value tends to be a pre­req­ui­site for pub­li­ca­tion in schol­arly jour­nals. As a re­sult,

only a small and un­rep­re­sen­ta­tive sam­ple of re­search ever sees the light of day.

Why aren’t bad stud­ies rooted out? Some­times they are, but aca­demic suc­cess de­pends on pub­lish­ing novel re­sults, so re­searchers have lit­tle in­cen­tive to check the work of oth­ers. Jour­nals that pub­lish re­search, and in­sti­tu­tions that fund it, should de­mand more trans­parency. Re­quire re­searchers to doc­u­ment their work, in­clud­ing any neg­a­tive or “in­signif­i­cant” re­sults. In­sist on repli­ca­tion. Sup­ple­ment p-val­ues with other mea­sures, such as con­fi­dence in­ter­vals that in­di­cate the size of the es­ti­mated ef­fect. Look at the ev­i­dence as a whole, and beware of re­sults that haven’t been re­peated or that de­pend on a sin­gle method of mea­sure­ment. And hold find­ings to a higher stan­dard if they con­flict with com­mon sense.

Newspapers in English

Newspapers from Australia

© PressReader. All rights reserved.