Some papers about p values

Jump to follow-up

These papers have nothing much to do with single molecule kinetics. They were written by David Colquhoun after his retirement from the world of single ion channels, as a way to keep him off the streets. They are listed here as a convenient place to keep a record.

The papers concern the misinterpretation of tests of significance. Such tests were barely ever used in our single ion channel work. They represent a return to the interest of DC in statistical inference that he had in the 1960s, and which culminated on the publication of a textbook, Lectures on Biostatistics (OUP, 1971). The textbook has aged quite well, with the exception of the parts on interpretation of p values. In the 1960s, I missed entirely the problems of null hypothesis significance testing. But better late than never.

The problem lies in the fact that most people still think that the p value is the proBability that your results occurred by chance -see, for example. Gigerenzer et al.,(2006) [download pdf]. It is nothing of the sort.

The false positive risk (FPR) is the probability that a result that has been labelled as “statistically significant” is in fact a false positive. It is always bigger than the p value, often much bigger.

My recommendations. In brief, I suggest that p values and confidence intervals should still be cited, but they should be supplemented by a single number that gives an idea of the false positive risk (FPR). The simplest way to do this is to calculate the false positive risk that corresponds to a prior probability of there being a real effect of 0.5. This would still be optimistic for implausible hypotheses but it would be a great improvement on p values. The FPR, calculated in this way is just a more comprehensible way of citing likelihood ratio (see 2018 paper).

Please note: the term “false discovery rate”, which was used in earlier papers, has now been replaced by “false positive risk”. The reasons for this change are explained in the introduction of the 2017 paper.

If you prefer a video to reading, try this, on YouTube.

In 2019, a paper appeared in Royal Soc Open Science that criticised some aspects of my 2017 paper: “A more principled use of the p-value? Not so fast: a critique of Colquhoun’s argument”, by Ognjen Arandjelović.
My response to this paper has appeared in arXiv.

Original papers about the problem

Colquhoun, D. (2014). An investigation of the false discovery rate and the misinterpretation of p-values. Royal Society Open Science This first paper looked at the risk of false positive results by simulation of Student’s t test. The advantage of simulation is that it makes the assumptions very clear without much mathematics. The disadvantage is that the results aren’t very general.

Colquhoun, D. (2017). The reproducibility of research and the misinterpretation of p-values . Royal Society Open Science. This paper gives, in the appendix, mathematically exact solutions or the false positive risk, calculated by the p-equals method. This allows the false positive risk to be calculated, as a function of the observed p value, for a range of sample sizes. A web calculator is provided that makes the calculations simple to do.

Colquhoun, D. and Longstaff, C. (2017). Web calculator for false positive risk

Colquhoun, D. (2018) The false positive risk: a proposal concerning what to do about p values, The American Statistician, 2019. Full text available at This paper examines more closely than before the assumptions that are made in calculations of FPR. It makes concrete proposals about how to solve the problem posed by the inadequacy of p values, with examples.

The American Statistician published this paper in March 2019, but the arXiv version has better links to the references.

In the same online edition, The American Statistician published 43 papers that were designed to say what should be done about the problem of abuse of p values.

At the same time, Nature published a comment piece on the p value problem. The gist of this piece was a plea to abandon the term “statistically-significant”, because it involves the obviously silly idea that observing p = 0.049 tells you something different from p = 0.051. It was co-signed by 840 people (including me). Nature also published an editorial which half-understood the problem and, sadly, said “Nature is not seeking to change how it considers statistical analysis in evaluation of papers at this time”. This sums up the problem: it is in the interests of both authors, and of journals, to continue to publish too many false positive results. Until this problem is solved, the corruption will continue.

Popular accounts of the problem

Colquhoun, D. (2015) False discovery rates and P values: the movie. On YouTube. This slide show is now superseded by the 2018 version.

Colquhoun, D, (2015). The perils of p-values. In Chalkdust magazine. Available at Chalkdust is a magazine run by students at UCL. This article deals with the principles of randomisation tests as a non-mathematical way to get p values, plus a bit about what’s wrong with p values.

Colquhoun,D.(2015) Randomisation tests. How to get a P value with no mathematics. A short (6 slides, 15 min) video on YouTube. Forget t tests. The randomisation test is at least as powerful and it makes no assumption of normal distributions. Furthermore it makes very clear the fact that random allocation of treatments is an essential assumption for all tests of statistical significance. Of course the result is just a p value. It doesn’t tell you the probability that you are wrong: for that, see the other stuff on this page.

Colquhoun, D.(2016). The problem with p-values. Aeon magazine. (This attracted 147 comments.
This essay is about the logic of inductive inferencee. It is a non-mathematical introduction to the ideas raised in my 2014 paper.

Colquhoun, D. (2017) Five ways to fix statistics. State false positive risk, too. Nature, volume 251. A collection of short comments by five authors on what should be done about p values.

Colquhoun, D. (2018). The false positive risk: a proposal concerning what to do about p-values (version 2). This video is a slightly extended version of a talk that I gave at the Evidence Live meeting, June 2018, at the Centre for Evidence-Based Medicine, Oxford. It supersedes my earlier 2015 video on the same topic. It is an exposition of the ideas that are given in more detail in the 2017 paper and in the 2018 paper. In November 2018 an new version was posted -the content is the same, but the volume of the sound track is better.