National Post (National Edition)
Women on boards
Welcome to 15th Anniversary Junk Science Week event, dedicated to exposing the scientists, NGOs, activists, politicians, journalists, media outlets, cranks and quacks who manipulate science data to achieve their objectives. Our standard definition remains in place: Junk science occurs when scientific facts are distorted, risk is exaggerated and the science adapted and warped by politics and ideology to serve another agenda.
It’s a definition that applies to the social sciences, especially economics, where all manner of statistical manipulations and random juxtapositions are routinely produced to support some policy or decision. Daily business journalism, no science, plays a big role disseminating junk economic science. “Stocks rose 25 points today on news that unemployment fell to 7.6% from 7.8%,” despite the fact that a 25-point change in a stock average cannot be anything more than a random move.
More complicated and misleading forms of economic junk science are marshaled to achieve ideological and political objectives. Business and government are both guilty. Later this week we’ll have a piece that tracks the non-existent evidence for the claim that Canada has a skills shortage that requires major government programs and initiatives.
But we begin with a look at the growing political pressure in Ottawa and Ontario to pass some form of regulation to accelerate the appointment of women to the boards of Canadian corporations. Rona Ambrose, federal Status of Women Minister, says she is leading a new committee of business executives to study the issue of women directors. Aiming for bold action, Ms. Ambrose told “I [told the committee] I wanted action-oriented recommendations for the government to immediately act on,” she said. “We’ve had enough studies and enough reports.”
Out ahead of Ms. Ambrose in action-oriented actions for a women-on-boards policy is Laurel Broten, Ontario’s Minister Responsible for Women’s Issues. “The statistics are very clear,” Ms. Broten told a CBC Radio audience recently. “Improved financial performance is what you see in a company that has more women on their board. You know, 53% better return on equity, return on sales, return on invested capital. We’ve seen studies over and over again…”
To increase the diversity of boards, Ontario will mandate the Ontario Securities Commission to develop a “comply or explain” regime where corporations will have to explain why they have not put a certain percentage of women in directors’ seats. This is not a quota-based system, “absolutely not,” said Ms. Broten. She supports “evidence-based” information on the benefits of women on boards.
It is this “evidence-based” information that Ms. Broten was citing when she claimed, with absolute certainty, that women on boards generate higher return on equity and higher return on sales.
For some years, Catalyst—which has a Canadianbranch—hasclaimed that organizations with women on boards have “stronger financial performance.” Back in 2001, the Conference Board of Canada circulated an unpublished report stating that corporations “with two or more women on the board in 1995 were far more likely to be industry leaders in revenues and profits six years later in 2001.”
The source of Ms. Broten’s claim of a 53% boost in return on equity is Catalyst, an international advocacy group for women in business. In a 2007 study of companies on the Fortune 500 list and their performance between 2001 and 2004, Catalyst claimed women on boards produced 42% higher return on sales (13.7% compared with 9.7%) and 66% higher return on invested capital (7.7% versus 4.7%).
Catalyst said this “link between women on board of directors and corporate performance holds across industries,” including consumer staples, financial services, industrials, technology and materials.
Only in the very very fine print, however, does the Catalyst note say that “correlation does not prove or imply causation.” In a 2011 report on corporate data from 2004 to 2008, Catalyst again hedges on the cause and effect. “Catalyst designed the Bottom Line report series to establish whether an empirical link exists between gender diversity in corporate leadership and financial performance. These studies have examined historical data and revealed significant statistical correlations. The studies do not, however, establish or imply causal connection.”
All those “empirical links” and “significant statistical correlations” but no causation: what can it all mean? As Stephen Ziliak highlights in his Unsignificant Statistics commentary elsewhere on this page, where there is statistical significance there is mostly junk science, with links and correlations that essentially signify nothing.
It may well be, for example, that corporations that are initially highly profitable and have more women on their boards. Do profitable firms appoint more women after the fact? Or other factors may be at play. Mr. Ziliak, commenting on the Catalyst data, said “one needs to put other economic, structural, and demographic variables into a multiple regression model. Comparisons of simple averages hides important information, most of which might not have anything at all to do with the gender of board members.”
In other words, Ms. Broten and Catalyst are drawing on the fallacies of statistical significance to promote a political objective.
Iwant to believe as much as the next person that particle physicists have discovered a Higgs boson, the so-called “God particle,” one with a mass of 125 gigaelectronic volts (GeV). But so far I do not buy the statistical claims being made about the discovery. Since the claims about the evidence are based on “statistical significance” – that is, on the number of standard deviations by which the observed signal departs from a null hypothesis of “no difference” – the physicists’ claims are not believable. Statistical significance is junk science, and its big piles of nonsense are spoiling the research of more than particle physicists.
I’m an economist. So don’t trust me with newfangled junk bonds or the fate of the world financial system. But here is something you can believe, and will want to: Statistical significance stinks. In statistical sciences from economics to medicine, including some parts of physics and chemistry, the ubiquitous “test” for “statistical significance” cannot, and will not, prove that a Higgs boson exists, any more than it can prove the reality of God, the existence of a good pain pill, or the validity of loose monetary policy.
A statistically significant departure from an assumed-to-betrue null hypothesis is by itself no proof of anything. Likewise, failure to achieve statistical significance at the .05 or other stipulated level is not proof that nothing of importance has been discovered.
It sounds too simple to be true, but in fact the two most fundamental problems with the test of statistical significance stem from bits of faulty logic.
The test of significance customarily begins with the stipulation of a “null hypothesis.” Expressed in algebraic terms, a new object A is on average assumed to be no different from a familiar object B, where the objects could be types of weight loss pills, tax schemes, or differently named physical particles. Data are collected, experimentally or otherwise, and then a calculation is made to determine the likelihood that data greater than that which we see on average could have occurred if in fact there is no observable difference between the objects under study.
The formal name for this odd calculation is “p value”. If the p value has a low number – in social sciences and business, if p falls below .05, or a 1-in-20 chance – the result of the experiment is said to be “statistically significant.” The claim is that the new object A, for example, is statistically significantly different from the old object B, because the chance of seeing a bigger difference between A and B – bigger than the difference you have seen – is small. It’s a strange standard.
Likewise, if p exceeds .05 (or whatever arbitrary line the scientists have drawn – lower in physics, higher in business), the result of the experiment is said to be inconclusive, and thus ignorable.
The null hypothesis test procedure is not the only test of significance but it is the most commonly used and abused of all the tests. From the get go, the test of statistical significance asks the wrong question. The test asks: “Assuming that the null hypothesis is true – that the Higgs boson (or whatever) does not exist - what is the probability of seeing a result at least as large as the one we have seen in the data?” This probability calculation is the pvalue.
In framing the quantitative question the way they do, the significance-testing scientists have unknowingly reversed the fundamental equation of statistics. Believe it or not, they have transposed their hypothesis and data, forcing them to grossly distort the magnitudes of probable events – and here’s why.
They have fallen for a mistaken logic called in statistics the “fallacy of the transposed conditional.” If Mrs. Smith gets a cramp this week, and dies, one could not simply conclude that Mrs. Smith probably died from a cramp. This is because the probability of having a cramp, given that you are dead, is not equal to the probability that you are dead, given that you had a cramp. Mrs. Smith might have died for any number of other reasons. But that is precisely the reversal of hypothesis and data that, however illogical, particle physicists in Geneva – and most other scientists in fields from economics to medicine – continue to make.
I show in a book I wrote on the subject with Deirdre N. McCloskey,
(2008), that the null hypothesis test procedure – another name for statistical significance testing – produces many such errors, with tragic results for real world economies, law, medicine, and even human life.
In a decades-long survey of leading journals, from the American Economic Review to the New England Journal of Medicine, we find that eight or nine of every 10 articles assumes that statistical significance demonstrates scientific, economic, or other human importance, and that a lack of statistical significance – statistical “insignificance,” or a p-value greater than .05 – indicates a lack of importance. Statistical significance is neither necessary nor sufficient for proving a physical, economic, or medical result. But the bureaucracies of science –from government grantors to journal referees – continue to insist on demonstrations of statistical significance, regardless of the real economic, medical, physical, or other effects revealed by the total evidence.
Consider again the illogic of the physicists’ procedure. The signal in the data which has been observed over and above background noise (denoted as being at 5 sigma) is possibly a Higgs boson – that is true. But in sober moments - when flash-bulbs fade and fizzy drinks fall flat – those same particle physicists admit that the jury is still out – that the statistically significant bump could be “consistent with” other plausible hypotheses, not specified by their models – just like Mrs. Smith could have died of something other than cramp, and probably did.
This is self-evident. A statistically significant result might be evidence of some other particle or field – a Jove or Zeus or Prometheus just like the anticipated Higgs. But because the models used by the physicists do not assign probabilistic weights to Higgs and its competing hypotheses, prior and posterior beliefs about all hypotheses remain static, neglected, and unknown.
Thus the reported chance of finding a Higgs boson – measured, the physicists illogically claim, by their super-small pvalue – is incorrect. The p-value merely shows the likelihood that data that were not observed, did not occur – that is, particles heavier than 125 GeV were with high probability not explained by the null hypothesis of “no Higgs boson.” But it’s impossible to go from “I see something different from the null hypothesis” to “I see my favourite hypothesis” without adding in some new assumptions, taking us from the fallacy of the transposed conditional to clear statements about the probability of the favoyred hypothesis, such as Higgs.
Significance testers won’t say what those additional assumptions are, but they seem content to make additional inferences based upon those faulty assumptions.
There is a second, equally fundamental problem with the theory and practice of significance testing. The test does not tell us how big or small the effect size is; it doesn’t tell us how important (or useful, or dangerous, or surprising) the effect is, in a metric of big and small – what McCloskey and I call the “oomph.” As the eminent statistician, Leonard “Jimmie” Savage noted in Foundations of Statistics (1954), statistical significance tells and others to suffer anosmia, the permanent loss of smell. Matrixx ignored a number of adverse reports it had received from doctors and users since 1999. One doctor told the company that zinc toxicity was discovered by biologists back in the 1930s. When a doctor appeared on Good Morning America in 2004 and spilled the beans on the company, Matrixx stock price plummeted. Again the company hid, this time behind the argument that the adverse effects from taking Zicam up the nose were --¬ wait for it –¬ “not statistically significant.”
The Supreme Court rejected the conventional argument unanimously. The Court
concluded: “Matrixx¹s argument [about the adverse effects of Zicam] rests on the premise that statistical significance is the only reliable indication of causation. This premise is flawed.”
Said Justice Breyer sarcastically during oral arguments: “This statistical significance always works and always doesn¹t work.” In other words, the high Court agrees: hard and fast significance rules are junk science and need to be scrapped.