P values are not transferable

Of the many challenges I encounter in communicating statistical concepts to collaborators, a common one is that p-values are seen as “transferable.” Here is a simple example to illustrate what I mean.

Consider the following 2 \times 3 table, where the rows are sex (Male and Female) and the columns are smoking status (Current, Former, Never). We are interested in determining if the sex variable is independent of the smoking status variable. The counts are the number of observations in each category followed by relative frequencies.

dat.synth1 <- matrix(c(55,45,70,50,50,70),nrow=2)
rownames(dat.synth1) <- c("M","F")
colnames(dat.synth1) <- c("Current","Former","Never")
dat.synth1
##   Current Former Never
## M      55     70    50
## F      45     50    70
dat.synth1/sum(dat.synth1)
##     Current    Former     Never
## M 0.1617647 0.2058824 0.1470588
## F 0.1323529 0.1470588 0.2058824

I perform a \chi^2 Test of Independence, with p-value given below. I would conclude that the sex variable is not independent of the smoking variable.

chisq.test(dat.synth1)
## 
##  Pearson's Chi-squared test
## 
## data:  dat.synth1
## X-squared = 7.3789, df = 2, p-value = 0.02499

This test for association does not “transfer” to other hypotheses, e.g., marginal quantities. For example, in the sample 55 out of 175 are currently smokers and male (\approx 31.4 % and 45 out of 165 (\approx 27.3%) are current female smokers. The following would be incorrect to report: “We find that smoking status is not independent of sex (p=0.025), thus the proportion of males that are current smokers is different from the proportion of females that are current smokers.”

We can’t use the p value from the \chi^2 Test of Independence for any “post-hoc” type test on each proportion as done in the second part of the sentence. To assess the second claim we test if the proportion male current smokers is different from the proportion of females that are current smokers.

prop.test(c(55,45),c(175,165))
## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(55, 45) out of c(175, 165)
## X-squared = 0.5205, df = 1, p-value = 0.4706
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  -0.06101684  0.14413372
## sample estimates:
##    prop 1    prop 2 
## 0.3142857 0.2727273

So we can’t conclude a difference in the two proportions in the population. The main issue is that hypotheses should be clearly established a priori, and not generated and tested using that data. Or put another way, performing many tests after seeing the data is p-hacking.

Leave a Reply

Your email address will not be published. Required fields are marked *