Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations
- Core Principle: This influential essay details the pervasive misinterpretations and abuse of fundamental statistics—P-values, Confidence Intervals (CIs), and statistical power—in scientific literature.
- P-value Clarification: The authors correct 12 common misconceptions, asserting that the P-value is not the probability of the null hypothesis (\(H_0\)) being true, but rather the probability of the observed data (or more extreme) given that \(H_0\) is true.
- Misuse of Significance: The essay strongly condemns the practice of drawing dichotomous conclusions based on an arbitrary threshold (e.g., P<0.05), which falsely implies certainty and hinders scientific progress.
- Recommendation: Researchers are urged to shift focus from the P-value to the effect magnitude and its precision, which are better conveyed through Confidence Intervals, and to integrate statistical results with contextual, external evidence.
PubMed: 27209009 DOI: 10.1007/s10654-016-0149-3 Overview generated by: Gemini 2.5 Flash, 26/11/2025
Key Findings: Correcting the Misinterpretations of Statistical Inference
This essential essay, authored by a collective of prominent statisticians and epidemiologists, provides a detailed guide to the widespread misinterpretations and abuse of basic statistical concepts: P-values, confidence intervals (CIs), and statistical power. It serves as a strong complement to the American Statistical Association’s (ASA) 2016 statement on p-values, emphasizing that these misuses are rampant and lead to profoundly flawed scientific conclusions.
Misinterpretations of the P-value
The authors outline 12 common misconceptions about the P-value, which is defined correctly as the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the data, assuming the null hypothesis (\(H_0\)) is true.
The paper explicitly states that the P-value is NOT: * The probability that the study hypothesis is true. * The probability that the null hypothesis is true. * The probability that a result is due to chance. * A measure of the magnitude or importance of an effect.
They emphasize that a common and disastrous error is using the P-value threshold (e.g., P<0.05) to draw a dichotomous conclusion (i.e., ‘significant’ or ‘non-significant’), which falsely suggests the conclusion is certain or that two studies with slightly different P-values (e.g., P=0.04 and P=0.06) have fundamentally different results.
Misinterpretations of Confidence Intervals (CIs)
The paper clarifies that a Confidence Interval (CI), commonly 95% CI, is defined by its long-run performance. If one were to repeat the study an infinite number of times, 95% of the CIs constructed would contain the true value of the parameter.
The paper stresses that a CI is NOT a probability statement about the parameter in the specific study at hand. Misinterpretations include: * Assuming there is a 95% probability that the true effect lies within the observed interval. * Assuming that values outside the interval are refuted or implausible.
The main value of CIs is their ability to convey the precision of the estimate and the range of effect magnitudes that are compatible with the data, encouraging researchers to focus on effect size rather than just statistical significance.
Misinterpretations of Statistical Power
Statistical power is the probability of obtaining a statistically significant result, given a specific assumed effect size and the study’s design.
The authors note the main misuses: * Treating power as a continuous measure of study quality; it is highly dependent on the hypothesized effect size. * Misinterpreting low power: A non-significant result from a low-power study does not imply that the true effect is small or non-existent, only that the study was incapable of detecting the hypothesized effect.
Conclusion
The essay’s ultimate conclusion is that statistical inference tools, including P-values, are just one component of scientific reasoning. Their correct use requires attention to design, measurement quality, data integrity, and background knowledge. They recommend using CIs to emphasize effect magnitude and avoiding the common practice of dichotomizing results based on arbitrary P-value thresholds.