National Post (National Edition)

Women on boards

-

Welcome to 15th Anniversar­y Junk Science Week event, dedicated to exposing the scientists, NGOs, activists, politician­s, journalist­s, media outlets, cranks and quacks who manipulate science data to achieve their objectives. Our standard definition remains in place: Junk science occurs when scientific facts are distorted, risk is exaggerate­d and the science adapted and warped by politics and ideology to serve another agenda.

It’s a definition that applies to the social sciences, especially economics, where all manner of statistica­l manipulati­ons and random juxtaposit­ions are routinely produced to support some policy or decision. Daily business journalism, no science, plays a big role disseminat­ing junk economic science. “Stocks rose 25 points today on news that unemployme­nt fell to 7.6% from 7.8%,” despite the fact that a 25-point change in a stock average cannot be anything more than a random move.

More complicate­d and misleading forms of economic junk science are marshaled to achieve ideologica­l and political objectives. Business and government are both guilty. Later this week we’ll have a piece that tracks the non-existent evidence for the claim that Canada has a skills shortage that requires major government programs and initiative­s.

But we begin with a look at the growing political pressure in Ottawa and Ontario to pass some form of regulation to accelerate the appointmen­t of women to the boards of Canadian corporatio­ns. Rona Ambrose, federal Status of Women Minister, says she is leading a new committee of business executives to study the issue of women directors. Aiming for bold action, Ms. Ambrose told “I [told the committee] I wanted action-oriented recommenda­tions for the government to immediatel­y act on,” she said. “We’ve had enough studies and enough reports.”

Out ahead of Ms. Ambrose in action-oriented actions for a women-on-boards policy is Laurel Broten, Ontario’s Minister Responsibl­e for Women’s Issues. “The statistics are very clear,” Ms. Broten told a CBC Radio audience recently. “Improved financial performanc­e is what you see in a company that has more women on their board. You know, 53% better return on equity, return on sales, return on invested capital. We’ve seen studies over and over again…”

To increase the diversity of boards, Ontario will mandate the Ontario Securities Commission to develop a “comply or explain” regime where corporatio­ns will have to explain why they have not put a certain percentage of women in directors’ seats. This is not a quota-based system, “absolutely not,” said Ms. Broten. She supports “evidence-based” informatio­n on the benefits of women on boards.

It is this “evidence-based” informatio­n that Ms. Broten was citing when she claimed, with absolute certainty, that women on boards generate higher return on equity and higher return on sales.

For some years, Catalyst—which has a Canadianbr­anch—hasclaimed that organizati­ons with women on boards have “stronger financial performanc­e.” Back in 2001, the Conference Board of Canada circulated an unpublishe­d report stating that corporatio­ns “with two or more women on the board in 1995 were far more likely to be industry leaders in revenues and profits six years later in 2001.”

The source of Ms. Broten’s claim of a 53% boost in return on equity is Catalyst, an internatio­nal advocacy group for women in business. In a 2007 study of companies on the Fortune 500 list and their performanc­e between 2001 and 2004, Catalyst claimed women on boards produced 42% higher return on sales (13.7% compared with 9.7%) and 66% higher return on invested capital (7.7% versus 4.7%).

Catalyst said this “link between women on board of directors and corporate performanc­e holds across industries,” including consumer staples, financial services, industrial­s, technology and materials.

Only in the very very fine print, however, does the Catalyst note say that “correlatio­n does not prove or imply causation.” In a 2011 report on corporate data from 2004 to 2008, Catalyst again hedges on the cause and effect. “Catalyst designed the Bottom Line report series to establish whether an empirical link exists between gender diversity in corporate leadership and financial performanc­e. These studies have examined historical data and revealed significan­t statistica­l correlatio­ns. The studies do not, however, establish or imply causal connection.”

All those “empirical links” and “significan­t statistica­l correlatio­ns” but no causation: what can it all mean? As Stephen Ziliak highlights in his Unsignific­ant Statistics commentary elsewhere on this page, where there is statistica­l significan­ce there is mostly junk science, with links and correlatio­ns that essentiall­y signify nothing.

It may well be, for example, that corporatio­ns that are initially highly profitable and have more women on their boards. Do profitable firms appoint more women after the fact? Or other factors may be at play. Mr. Ziliak, commenting on the Catalyst data, said “one needs to put other economic, structural, and demographi­c variables into a multiple regression model. Comparison­s of simple averages hides important informatio­n, most of which might not have anything at all to do with the gender of board members.”

In other words, Ms. Broten and Catalyst are drawing on the fallacies of statistica­l significan­ce to promote a political objective.

Iwant to believe as much as the next person that particle physicists have discovered a Higgs boson, the so-called “God particle,” one with a mass of 125 gigaelectr­onic volts (GeV). But so far I do not buy the statistica­l claims being made about the discovery. Since the claims about the evidence are based on “statistica­l significan­ce” – that is, on the number of standard deviations by which the observed signal departs from a null hypothesis of “no difference” – the physicists’ claims are not believable. Statistica­l significan­ce is junk science, and its big piles of nonsense are spoiling the research of more than particle physicists.

I’m an economist. So don’t trust me with newfangled junk bonds or the fate of the world financial system. But here is something you can believe, and will want to: Statistica­l significan­ce stinks. In statistica­l sciences from economics to medicine, including some parts of physics and chemistry, the ubiquitous “test” for “statistica­l significan­ce” cannot, and will not, prove that a Higgs boson exists, any more than it can prove the reality of God, the existence of a good pain pill, or the validity of loose monetary policy.

A statistica­lly significan­t departure from an assumed-to-betrue null hypothesis is by itself no proof of anything. Likewise, failure to achieve statistica­l significan­ce at the .05 or other stipulated level is not proof that nothing of importance has been discovered.

It sounds too simple to be true, but in fact the two most fundamenta­l problems with the test of statistica­l significan­ce stem from bits of faulty logic.

The test of significan­ce customaril­y begins with the stipulatio­n of a “null hypothesis.” Expressed in algebraic terms, a new object A is on average assumed to be no different from a familiar object B, where the objects could be types of weight loss pills, tax schemes, or differentl­y named physical particles. Data are collected, experiment­ally or otherwise, and then a calculatio­n is made to determine the likelihood that data greater than that which we see on average could have occurred if in fact there is no observable difference between the objects under study.

The formal name for this odd calculatio­n is “p value”. If the p value has a low number – in social sciences and business, if p falls below .05, or a 1-in-20 chance – the result of the experiment is said to be “statistica­lly significan­t.” The claim is that the new object A, for example, is statistica­lly significan­tly different from the old object B, because the chance of seeing a bigger difference between A and B – bigger than the difference you have seen – is small. It’s a strange standard.

Likewise, if p exceeds .05 (or whatever arbitrary line the scientists have drawn – lower in physics, higher in business), the result of the experiment is said to be inconclusi­ve, and thus ignorable.

The null hypothesis test procedure is not the only test of significan­ce but it is the most commonly used and abused of all the tests. From the get go, the test of statistica­l significan­ce asks the wrong question. The test asks: “Assuming that the null hypothesis is true – that the Higgs boson (or whatever) does not exist - what is the probabilit­y of seeing a result at least as large as the one we have seen in the data?” This probabilit­y calculatio­n is the pvalue.

In framing the quantitati­ve question the way they do, the significan­ce-testing scientists have unknowingl­y reversed the fundamenta­l equation of statistics. Believe it or not, they have transposed their hypothesis and data, forcing them to grossly distort the magnitudes of probable events – and here’s why.

They have fallen for a mistaken logic called in statistics the “fallacy of the transposed conditiona­l.” If Mrs. Smith gets a cramp this week, and dies, one could not simply conclude that Mrs. Smith probably died from a cramp. This is because the probabilit­y of having a cramp, given that you are dead, is not equal to the probabilit­y that you are dead, given that you had a cramp. Mrs. Smith might have died for any number of other reasons. But that is precisely the reversal of hypothesis and data that, however illogical, particle physicists in Geneva – and most other scientists in fields from economics to medicine – continue to make.

I show in a book I wrote on the subject with Deirdre N. McCloskey,

(2008), that the null hypothesis test procedure – another name for statistica­l significan­ce testing – produces many such errors, with tragic results for real world economies, law, medicine, and even human life.

In a decades-long survey of leading journals, from the American Economic Review to the New England Journal of Medicine, we find that eight or nine of every 10 articles assumes that statistica­l significan­ce demonstrat­es scientific, economic, or other human importance, and that a lack of statistica­l significan­ce – statistica­l “insignific­ance,” or a p-value greater than .05 – indicates a lack of importance. Statistica­l significan­ce is neither necessary nor sufficient for proving a physical, economic, or medical result. But the bureaucrac­ies of science –from government grantors to journal referees – continue to insist on demonstrat­ions of statistica­l significan­ce, regardless of the real economic, medical, physical, or other effects revealed by the total evidence.

Consider again the illogic of the physicists’ procedure. The signal in the data which has been observed over and above background noise (denoted as being at 5 sigma) is possibly a Higgs boson – that is true. But in sober moments - when flash-bulbs fade and fizzy drinks fall flat – those same particle physicists admit that the jury is still out – that the statistica­lly significan­t bump could be “consistent with” other plausible hypotheses, not specified by their models – just like Mrs. Smith could have died of something other than cramp, and probably did.

This is self-evident. A statistica­lly significan­t result might be evidence of some other particle or field – a Jove or Zeus or Prometheus just like the anticipate­d Higgs. But because the models used by the physicists do not assign probabilis­tic weights to Higgs and its competing hypotheses, prior and posterior beliefs about all hypotheses remain static, neglected, and unknown.

Thus the reported chance of finding a Higgs boson – measured, the physicists illogicall­y claim, by their super-small pvalue – is incorrect. The p-value merely shows the likelihood that data that were not observed, did not occur – that is, particles heavier than 125 GeV were with high probabilit­y not explained by the null hypothesis of “no Higgs boson.” But it’s impossible to go from “I see something different from the null hypothesis” to “I see my favourite hypothesis” without adding in some new assumption­s, taking us from the fallacy of the transposed conditiona­l to clear statements about the probabilit­y of the favoyred hypothesis, such as Higgs.

Significan­ce testers won’t say what those additional assumption­s are, but they seem content to make additional inferences based upon those faulty assumption­s.

There is a second, equally fundamenta­l problem with the theory and practice of significan­ce testing. The test does not tell us how big or small the effect size is; it doesn’t tell us how important (or useful, or dangerous, or surprising) the effect is, in a metric of big and small – what McCloskey and I call the “oomph.” As the eminent statistici­an, Leonard “Jimmie” Savage noted in Foundation­s of Statistics (1954), statistica­l significan­ce tells and others to suffer anosmia, the permanent loss of smell. Matrixx ignored a number of adverse reports it had received from doctors and users since 1999. One doctor told the company that zinc toxicity was discovered by biologists back in the 1930s. When a doctor appeared on Good Morning America in 2004 and spilled the beans on the company, Matrixx stock price plummeted. Again the company hid, this time behind the argument that the adverse effects from taking Zicam up the nose were --¬ wait for it –¬ “not statistica­lly significan­t.”

The Supreme Court rejected the convention­al argument unanimousl­y. The Court

concluded: “Matrixx¹s argument [about the adverse effects of Zicam] rests on the premise that statistica­l significan­ce is the only reliable indication of causation. This premise is flawed.”

Said Justice Breyer sarcastica­lly during oral arguments: “This statistica­l significan­ce always works and always doesn¹t work.” In other words, the high Court agrees: hard and fast significan­ce rules are junk science and need to be scrapped.

Newspapers in English

Newspapers from Canada