Bloomberg Businessweek (Asia)

A dangerous statistica­l quirk • Lax bank security in developing nations

Anyone with an interest in how research forms public policy should pay attention to p-values

-

Decisions affecting millions of people should be made using the best possible informatio­n. That’s why researcher­s, public officials, and anyone with views on social policy should pay attention to a controvers­y in statistics. The lesson: Watch out if you see a claim of the form “x is significan­tly related to y.”

At issue is a statistica­l test that researcher­s in a wide range of discipline­s, from medicine to economics, use to draw conclusion­s from data. Let’s say you have a pill that’s supposed to make people rich. You give it to 30 people, and they wind up 1 percent richer than a similar group that took a placebo.

Before you can attribute this difference to your magic pill, you need to test your results with a narrow and dangerousl­y subtle question: How likely would you be to get this result if your pill had no effect whatsoever? If this probabilit­y, or so-called p-value, is less than a stated threshold—often set at 5 percent—the result is deemed “statistica­lly significan­t.”

The problem is, people tend to place great weight on this declaratio­n of statistica­l significan­ce without understand­ing what it really means. A low p-value doesn’t, for example, mean that the pill almost certainly works. Any such conclusion would need more informatio­n—including, for a start, some reason to think the pill could make you richer.

In addition, statistica­l significan­ce isn’t policy significan­ce. The size of the estimated effect matters. It might be so small as to lack practical value, even though it’s statistica­lly significan­t. The converse is also true: An estimated effect might be so strong as to demand attention, even though it fails the p-value test.

These reservatio­ns apply even to statistica­l investigat­ion done right. Unfortunat­ely, it very often isn’t. Researcher­s commonly engage in “p-hacking,” tweaking data in ways that generate low p-values but actually undermine the test. Absurd results can be made to pass the p-value test, and important findings can fail. Despite all this, a good p-value tends to be a prerequisi­te for publicatio­n in scholarly journals. As a result,

only a small and unrepresen­tative sample of research ever sees the light of day.

Why aren’t bad studies rooted out? Sometimes they are, but academic success depends on publishing novel results, so researcher­s have little incentive to check the work of others. Journals that publish research, and institutio­ns that fund it, should demand more transparen­cy. Require researcher­s to document their work, including any negative or “insignific­ant” results. Insist on replicatio­n. Supplement p-values with other measures, such as confidence intervals that indicate the size of the estimated effect. Look at the evidence as a whole, and beware of results that haven’t been repeated or that depend on a single method of measuremen­t. And hold findings to a higher standard if they conflict with common sense.

 ??  ??

Newspapers in English

Newspapers from Australia