A few years ago I watched TED talk by Peter Donnelly dealing with statistics, and in particular the prosecutor’s fallacy. You can view it here. This was a thoroughly enjoyable talk and I recommend everyone to watch it. However, it turns out that not even statisticians are immune to making this basic error, as this recent article by Bernd Beber and Alexandra Scacco about the probabilities of election fraud in Iran shows.
First, let’s review the basic idea of how to compare the probabilities of the cause of a given event, using an example similar to one in the TED talk linked above.
Imagine sitting with a positive HIV test in your hands. The doctors tell you that it’s 95% accurate. What should you do? Call your family? Write your will? This means you have a 95% chance of having HIV, right? Wrong! There are two possible explanations for the result you’re holding in your hands:
- You have HIV, and the test is correct.
- You don’t have HIV, and the test is incorrect
If we want to know the likelihood of you actually having HIV we have to compare the probabilities of these two alternatives. To do this, we clearly need to know the a priori probability that you have HIV. According to this, it appears that roughly 33 million people have HIV/AIDS in the world, which means that a rough estimate of the a priori probability of a given person having HIV/AIDS is about 33 million / 6.67 billion, or 0.005. In other words, the probabilities of the two possible explanations for the data are:
- 0.005 * 0.95 = 0.00475
- (1-0.005) * (1-0.95) = 0.04975
In other words, it is roughly ten times more likely that you do not have HIV (and the test was wrong) then that you do (and the test was right). Phew! This is despite the test having a seemingly high accuracy, which demonstrates that merely looking at one half of the equation can potentially give a very incorrect understanding of the reality of the situation.
So how does all this relate to that article about the Iranian election? Well, the two PhD students who wrote the article claim:
The probability that a fair election would produce both too few non-adjacent digits and the suspicious deviations in last-digit frequencies described earlier is less than .005. In other words, a bet that the numbers are clean is a one in two-hundred long shot.
You should already see what’s wrong with this. The data they have show the a priori probability of ending up with the election results they got assuming a fair election, but this is clearly not the same as the probability that the election was fair! In order to know what the probability is of the election being tampered with we again have to compare the two alternative explanations:
- The election was tampered with
- The election was fair, and the results just happened to end up that way by random chance
In order to compare these two we need to know what the probability of election tampering is. Unfortunately, I have no idea, and neither do the authors of that article. It’s a tricky thing to estimate, especially since all major elections probably have tampering on at least some minor (insignificant) level, and to get better data we’d have to take regional variability into account and so on. Let’s just for the sake of argument work it out with an example number. This is a maths point, not a political point, so it doesn’t really matter. Let’s assume, for the sake of argument, that 99% of elections are fair (for some definition of “fair”). This means that the probabilities of the two alternatives are:
- 1-0.99 = 0.01
- 0.99 * 0.005 = 0.00495
In other words, given this made up and completely bogus example data, it would be about twice as likely that the election results were caused by tampering as random chance, or about 67% chance of election tampering. A far cry from 99.5%! Again, let me stress that this was just example data to make my point about the dangers of of the prosecutor’s fallacy, I’m not trying to produce any actual valid data here!
I should also point out that I have no stake in the Iranian elections, I’m making a point about shoddy statistics, not about politics, and we can’t just accept sloppy reasoning because the conclusion suits our politics now can we?
In closing, let me again point you to the TED talk I mentioned above. It’s well worth your time.
I came across this revised version of the article, where they point out in a footnote:
The last sentence should read: "In other words, a bet that a clean election would produce these
numbers is a one in two-hundred long shot." Thanks to the many readers who alerted us to this
There are two things to say about this. First, it doesn’t really impact the point of my blog post, which is that the data they presented say nothing about the probability of the Iranian elections being tampered with, and the argument they (originally) presented was an example of the Prosecutor’s fallacy. Second, their revised article appears to suggest that this was merely a poorly worded sentence and that they didn’t intend to argue that the probability of the Iranian elections being “clean” is 0.005. But this means they intentionally posted effectively half of an equation without making any mention that their results are essentially meaningless without more data to compare it with. I’m not sure this explanation puts the authors in a better light.