Reporting a confidence interval in place of a p value

This is the first in a series of posts to support the decades long push by statisticians to move away from the null hypotheses significance testing (NHST) paradigm (e.g., “The results are statistically significant (p<0.05).”) to a more nuanced and comprehensive style of reporting statistical results. Here is a good starting point for background on this issue.

In this post I show how by simply reporting a confidence interval in place of (or along with) a p value more information is conveyed to the reader.

Consider the following publication, where the authors follow five runners for the duration of an ultramarathon and record their individual behaviors (e.g., eating, drinking water). Note the following sentence in the abstract:

“Runners achieved a higher total carbohydrate consumption in the second half of the race (p=0.043), but no higher fluid intake (p=0.08).”

This sentence is typical within the NHST approach. I don’t have access to the data, so can’t compute the corresponding confidence intervals for the means (assuming the inference is on the mean). But, given the hypothetical confidence intervals below, consider the following changes to the previous sentence:

“Runners achieved a higher [mean] total carbohydrate consumption in the second half of the race (in g/15km, 95% CI: (0.62, 23.58)), but no higher [mean] fluid intake (in mL/15km, 95% CI: (-48.87, 562.87)).”

The statistical significance is conveyed by noting if zero is in the interval, but the size of the effect is also conveyed. For example, for carbohydrate consumption the result is “significant” but could only be by 0.62 g/15km or as much as 23.58 g/15km.

Since confidence intervals estimate population parameters (in this case, the population mean), reporting them help with inference. Suppose that the data resulted in narrower confidence intervals such as:

“Runners achieved a higher [mean] total carbohydrate consumption in the second half of the race (in g/15km, 95% CI: (0.5,4.3)), but no higher [mean] fluid intake (in mL/15km, 95% CI: (-1.9, 22.9)).”

Consistent with the previous two versions of the sentence, the statistical significance of the two confidence intervals is the same (i.e., p < 0.05 for the first interval and for the second p > 0.05), but we can clearly see how the estimates differ. For carbohydrate consumption, we can see an estimate for the mean increase of between 0.5 and 4.3 g/15km, which may or may not be practically significant.

I hope that this simple example shows how reporting confidence intervals instead of p values allow for conveying both statistical significance and practical significance in a simple way.

Leave a Reply

Your email address will not be published. Required fields are marked *