Of the many challenges I encounter in communicating statistical concepts to collaborators, a common one is that p-values are seen as “transferable.” Here is a simple example to illustrate what I mean.
Consider the following table, where the rows are sex (Male and Female) and the columns are smoking status (Current, Former, Never). We are interested in determining if the sex variable is independent of the smoking status variable. The counts are the number of observations in each category followed by relative frequencies.
dat.synth1 <- matrix(c(55,45,70,50,50,70),nrow=2)
rownames(dat.synth1) <- c("M","F")
colnames(dat.synth1) <- c("Current","Former","Never")
dat.synth1
## Current Former Never ## M 55 70 50 ## F 45 50 70
dat.synth1/sum(dat.synth1)
## Current Former Never ## M 0.1617647 0.2058824 0.1470588 ## F 0.1323529 0.1470588 0.2058824
I perform a Test of Independence, with p-value given below. I would conclude that the sex variable is not independent of the smoking variable.
chisq.test(dat.synth1)
## ## Pearson's Chi-squared test ## ## data: dat.synth1 ## X-squared = 7.3789, df = 2, p-value = 0.02499
This test for association does not “transfer” to other hypotheses, e.g., marginal quantities. For example, in the sample 55 out of 175 are currently smokers and male ( 31.4 % and 45 out of 165 (
27.3%) are current female smokers. The following would be incorrect to report: “We find that smoking status is not independent of sex (p=0.025), thus the proportion of males that are current smokers is different from the proportion of females that are current smokers.”
We can’t use the p value from the Test of Independence for any “post-hoc” type test on each proportion as done in the second part of the sentence. To assess the second claim we test if the proportion male current smokers is different from the proportion of females that are current smokers.
prop.test(c(55,45),c(175,165))
## ## 2-sample test for equality of proportions with continuity correction ## ## data: c(55, 45) out of c(175, 165) ## X-squared = 0.5205, df = 1, p-value = 0.4706 ## alternative hypothesis: two.sided ## 95 percent confidence interval: ## -0.06101684 0.14413372 ## sample estimates: ## prop 1 prop 2 ## 0.3142857 0.2727273
So we can’t conclude a difference in the two proportions in the population. The main issue is that hypotheses should be clearly established a priori, and not generated and tested using that data. Or put another way, performing many tests after seeing the data is p-hacking.