Monday 23 January 2012

Fraud and the Road to Abilene

Over the weekend, an (anonymized) interview was published in a Dutch national newspaper with the three “whistle blowers” who exposed the enormous fraud of Professor Diederik Stapel. Stapel had gained stardom status in the field of social psychology but, simply speaking, had been making up all his data all the time. There are two things that struck me:

First, in a previous post I wrote about the fraud, based on a flurry of newspaper articles and the interim report that a committee examining the fraud has put together, I wrote that it eventually was his clumsiness faking the data that got him caught. Although that general picture certainly remained – he wasn’t very good at faking data; I think I could have easily done a better job (although I have never even tried anything like that, honest!) – but it wasn’t as clumsy as the newspapers sometimes made it out to be.

Specifically, I wrote “eventually, he did not even bother anymore to really make up newly faked data. He used the same (fake) numbers for different experiments, gave those to his various PhD students to analyze, who then in disbelief slaving away in their adjacent cubicles discovered that their very different experiments led to exactly the same statistical values (a near impossibility). When they compared their databases, there was substantial overlap”. Now, it now seems the “substantial overlap” was merely a part of one column of data. Plus, there were various other things that got him caught.

I don’t beat myself too hard over the head with my keyboard about repeating this misrepresentation by the newspapers (although I have given myself a small slap on the wrist – after having received a verbal one from one of the whistlers) because my piece focused on the “why did he do it?” rather than the “how did he get caught”, but it does show that we have to give the three whistle blowers (quite) a bit more credit than I – and others – originally thought.

The second point that caught my attention is that, since the fraught was exposed, various people have come out admitting that they had “had suspicions all the time”. You could say “yeah right” but there do appear to be quite a few signs that various people indeed had been having their doubts for a longer time. For instance, I have read an interview with a former colleague of Stapel at Tilburg University credibly admitting to this, I have directly spoken to people who said there had been rumors for longer, and the article with the whistle blowers suggests even Stapel’s faculty dean might not have been entirely dumbfounded that it had all been too good to be true after all... All the people who admit to having doubts in private state that they did not feel comfortable raising the issue while everyone just seemed to applaud Stapel and his Science publications.

This reminded me of the Abilene Paradox, first described by Professor Jerry Harvey, from the George Washington University. He described a leisure trip which he and his wife and parents made in Texas in July, in his parents’ un-airconditioned old Buick to a town called Abilene. It was a trip they had all agreed to – or at least not disagreed with – but, as it later turned out, none of them had wanted to go on. “Here we were, four reasonably sensible people who, of our own volition, had just taken a 106-mile trip across a godforsaken desert in a furnace-like temperature through a cloud-like dust storm to eat unpalatable food at a hole-in-the-wall cafeteria in Abilene, when none of us had really wanted to go”

The Abilene Paradox describes the situation where everyone goes along with something, mistakenly assuming that others’ people’s silence implies that they agree. And the (erroneous) feeling to be the only one who disagrees makes a person shut up as well, all the way to Abilene.

People had suspicions about Stapel’s “too good to be true” research record and findings but did not dare to speak up while no-one else did.

It seems there are two things that eventually made the three whistle blowers speak up and expose Stapel: Friendship and alcohol.

They had struck up a friendship and one night, fuelled by alcohol, raised their suspicions to one another. And, crucially, decided to do something about it. Perhaps there are some lessons in this for the world of business. For example, Jim Westphal, who has done extensive, thorough research on boards of directors, showed that boards often suffer from the Abilene Paradox, for instance when confronted with their company’s new strategy. Yet, Jim and colleagues also showed that friendship ties within top management teams might not be such a bad thing. We are often suspicious of social ties between boards and top managers, fearful that it might cloud their judgment and make them reluctant to discipline a CEO. But it may be that such friendship ties – whether fuelled by alcohol or not – might also help to lower the barriers to resolving the Abilene Paradox. So perhaps we should make friendships and alcohol mandatory – religion permitting – both during board meetings and academic gatherings. It would undoubtedly help making them more tolerable as well.

Wednesday 11 January 2012

Bias (or why you can’t trust any of the research you read)

Researchers in Management and Strategy worry a lot about bias – statistical bias. In case you’re not such an academic researcher, let me briefly explain.

Suppose you want to find out how many members of a rugby club have their nipples pierced (to pick a random example). The problem is, the club has 200 members and you don’t want to ask them all to take their shirts off. Therefore, you select a sample of 20 of them guys and ask them to bare their chests. After some friendly bantering they agree, and then it appears that no fewer than 15 of them have their nipples pierced, so you conclude that the majority of players in the club likely have undergone the slightly painful (or so I am told) aesthetic enhancement.

The problem is, there is a chance that you’re wrong. There is a chance that due to sheer coincidence you happened to select 15 pierced pairs of nipples where among the full set of 200 members they are very much the minority. For example, if in reality out of the 200 rugby blokes only 30 have their nipples pierced, due to sheer chance you could happen to pick 15 of them in your sample of 20, and your conclusion that “the majority of players in this club has them” is wrong.

Now, in our research, there is no real way around this. Therefore, the convention among academic researchers is that it is ok, and you can claim your conclusion based on only a sample of observations, as long as the probability that you are wrong is no bigger than 5%. If it ain’t – and one can relatively easily compute that probability – we say the result is “statistically significant”. Out of sheer joy, we then mark that number with a cheerful asterisk * and say amen.

Now, I just said that “one can relatively easily compute that probability” but that is not always entirely true. In fact, over the years statisticians have come up with increasingly complex procedures to correct for all sorts of potential statistical biases that can occur in research projects of various natures. They treat horrifying statistical conditions such as unobserved heterogeneity, selection bias, heteroscedasticity, and autocorrelation. Let me not try to explain to you what they are, but believe me they’re nasty. You don’t want to be caught with one of those.

Fortunately, the life of the researcher is made easy by standard statistical software packages. They offer nice user-friendly menus where one can press buttons to solve problems. For example, if you have identified a heteroscedasticity problem in your data, there are various buttons to press that can cure it for you. Now, note that it is my personal estimate (but notice, no claims of an asterisk!) that about 95 out of a 100 researchers have no clue what happens within their computers when they press one of those magical buttons, but that does not mean it does not solve the problem. Professional statisticians will frown and smirk at the thought alone, but if you have correctly identified the condition and the way to treat it, you don’t necessarily have to fully understand how the cure works (although I think it often would help selecting the correct treatment). So far, so good.

Here comes the trick: All of those statistical biases are pretty much irrelevant. They are irrelevant because they are all dwarfed by another bias (for which there is no life-saving cure available in any of the statistical packages): publication bias.

The problem is that if you have collected a whole bunch of data and you don’t find anything or at least nothing really interesting and new, no journal is going to publish it. For example, the prestigious journal Administrative Science Quarterly proclaims in its “Invitation to Contributors” that it seeks to publish “counterintuitive work that disconfirms prevailing assumptions”. And perhaps rightly so; we’re all interested in learning something new. So if you, as a researcher, don’t find anything counterintuitive that disconfirms prevailing assumptions, you are usually not even going to bother writing it up. And in case you’re dumb enough to write it up and send it to a journal requesting them to publish it, you will swiftly (or less swiftly, dependent on what journal you sent it to) receive a reply that has the word “reject” firmly embedded in it.

Yet, unintended, this publication reality completely messes up the “5% convention”, i.e. that you can only claim a finding as real if there is only a 5% chance that what you found is sheer coincidence (rather than a counterintuitive insight that disconfirms prevailing assumptions). In fact, the chance that what you are reporting is bogus is much higher than the 5% you so cheerfully claimed with your poignant asterisk. Because journals will only publish novel, interesting findings – and therefore researchers only bother to write up seemingly intriguing counterintuitive findings – the chance that what they eventually are publishing is BS unwittingly is vast.

A recent article by Simmons, Nelson, and Simonsohn in Psychological Science (cheerfully entitled “False-Positive Psychology: Undisclosed Flexibility in Data Collection and Analysis Allows Presenting Anything as Significant”) summed it up prickly clearly. If a researcher, running a particular experiment, does not find the result he was expecting, he may initially think “that’s because I did not collect enough data” and collect some more. He can also think “I used the wrong measure; let me use the other measure I also collected” or “I need to correct my models for whether the respondent was male or female” or “examine a slightly different set of conditions”. Yet, taking these (extremely common) measures raises the probability that what the researcher finds in his data is due to sheer chance from the conventional 5% to a whopping 60.7%, without the researcher realising it. He will still cheerfully put the all-important asterisk in his table and declare that he has found a counterintuitive insight that disconfirms some important prevailing assumption.

In management and strategy research we do highly similar things. We for instance collect data with two or three ideas in mind in terms of what we want to examine and test with them. If the first idea does not lead to a desired result, the researcher moves on to his second idea and then one can hear a sigh of relief behind a computer screen that “at least this idea was a good one”. In fact, you might only be moving on to “the next good idea” till you have hit on a purely coincidental result: 15 bulky guys with pierced nipples.

Things get really “funny” when one realises that what is considered interesting and publishable is different in different fields in Business Studies. For example, in fields like Finance and Economics, academics are likely to be fairly skeptical whether Corporate Social Responsibility is good for a firm’s financial performance. In the subfield of Management people are much more receptive to the idea that Corporate Social Responsibility should also benefit a firm in terms of its profitability. Indeed, as shown by a simple yet nifty study by Marc Orlitzky, recently published in Business Ethics Quarterly, articles published on this topic in Management journals report a statistical relationship between the two variables which is about twice as big as the ones reported in Economics, Finance, or Accounting journals. Of course, who does the research and where it gets printed should not have any bearing on what the actual relationship is but, apparently, preferences and publication bias do come into the picture with quite some force.

Hence, publication bias vastly dominates any of the statistical biases we get so worked up about, making them pretty much irrelevant. Is this a sad state of affairs? Ehm…. I think yes. Is there an easy solution for it? Ehm… I think no. And that is why we will likely all be suffering from publication bias for quite some time to come.