Moving to a World Beyond \(p < 0.05\)
- Core Argument: This editorial calls for the abandonment of the term “statistically significant” and the practice of dichotomizing results based on the arbitrary \(p < 0.05\) threshold.
- Central Problem: Relying on the binary threshold distorts evidence, fuels poor practices like P-hacking and publication bias, and leads to the false equating of statistical significance with scientific importance.
- Recommendation: Researchers must treat the p-value as a continuous measure of incompatibility between the data and the null hypothesis. They should focus their interpretation on the magnitude of the effect and its precision, primarily communicated via Confidence Intervals (CIs).
PubMed: Not Indexed (The American Statistician) DOI: 10.1080/00031305.2019.1583913 Overview generated by: Gemini 2.5 Flash, 26/11/2025
Key Findings: The Case Against Dichotomous Statistical Significance
This highly influential editorial, which introduced a special issue of The American Statistician containing 43 papers on the topic, marks a critical step in the statistical community’s effort to reform scientific practice. It explicitly calls for an end to the culture of dichotomous thinking based solely on the threshold of \(p < 0.05\). The authors argue that declaring a result “statistically significant” based on an arbitrary cutoff is counterproductive and has fueled the reproducibility crisis.
The Problem with Dichotomization
The fundamental issue lies in the over-simplification of complex statistical results into a binary “significant/non-significant” outcome:
- Arbitrary Cutoff: The \(p < 0.05\) threshold is arbitrary. Treating a \(p=0.049\) result as fundamentally different from a \(p=0.051\) result leads to illogical conclusions and flawed decision-making.
- Exaggerated Confidence: Declaring a result “statistically significant” implies a certainty or importance that the p-value alone does not justify. It often leads researchers and the public to mistake statistical significance for scientific or clinical importance.
- Fueling Bias: The dichotomous framework is the root cause of poor research practices like P-hacking (data-dependent manipulation to cross the threshold) and publication bias (selective reporting of results that pass the threshold). These biases inflate the true rate of false positives in the literature.
The Solution: Embracing a Continuous View of Evidence
The authors propose a simple, yet profound, shift in reporting and thinking:
- Abandon “Statistical Significance”: The term “statistically significant” and its binary cousin “non-significant” should be retired from scientific discourse.
- Focus on Compatibility: Researchers should instead report the p-value as a measure of incompatibility between the data and the assumed statistical model (usually the null hypothesis). They must interpret this value continuously, considering the p-value’s proximity to zero as a gradient of evidence against the null.
- Emphasize Magnitude and Precision: The core focus of reporting should be on the effect size and its uncertainty, which is best communicated via Confidence Intervals (CIs) or Credible Intervals (from Bayesian analysis). CIs show the range of plausible effects compatible with the data.
- Integrate Context: Conclusions must be based on the entirety of the evidence, including the context of the research, the quality of the study design, external knowledge, and the costs/benefits of potential actions, not just the p-value.
The Role of Prediction Intervals and False Discovery Rates
The article encourages the use of various other tools that provide a more complete picture of the evidence, such as:
- Prediction Intervals: Show the range of values expected for a future observation.
- False Discovery Rates (FDR) and False Positive Risks (FPR): Help quantify the probability that a “significant” finding is actually false, which is often much higher than the p-value suggests.
The statement concludes by asserting that the scientific community needs to move toward a “post p < 0.05 era” where thoughtful interpretation replaces rigid decision rules.