The Abuse of Power: The Pervasive Fallacy of Power Calculations for Data Analysis

hypothesis testing
null hypothesis
p-values
statistical inference
statistical power
  • Core Argument: This paper argues that retrospective power calculations (or observed power), performed after a study yields a non-significant result, are fundamentally flawed and should not be used to aid in interpretation.
  • The Flaw (Circularity): Retrospective power is a simple monotonic transformation of the p-value; it provides no new information. A non-significant test will always have low retrospective power, as both measures are driven by the large variance or small observed effect.
  • Recommended Alternative: To interpret a non-significant finding, researchers should focus on the confidence interval, which reveals the range of plausible true effect sizes and indicates whether biologically or clinically important effects have been reasonably ruled out.
Published

23 January 2026

PubMed: Not Indexed (The American Statistician) DOI: 10.1080/00031305.2001.10473582 Overview generated by: Gemini 2.5 Flash, 26/11/2025

Key Finding: The Fallacy of Retrospective Power Analysis

This article by Hoenig and Heisey critically examines and rejects the common practice of performing post-experiment or retrospective power calculations (also called “observed power”) to interpret a statistically non-significant result (i.e., a failure to reject the null hypothesis, \(H_0\)).

The Flawed Logic of Retrospective Power

The authors demonstrate that the use of retrospective power as an aid to interpretation is fundamentally flawed because it is a simple monotonic transformation of the p-value.

  • Definition: Retrospective power (or observed power) is typically calculated as the statistical power to detect the observed effect size using the observed sample size and the observed variance.
  • The Circularity Problem: Because the observed effect size is used as the hypothetical “true” effect, the retrospective power calculation is nearly equivalent to the p-value:
    • A small p-value (significant result) will always lead to a high retrospective power.
    • A large p-value (non-significant result) will always lead to a low retrospective power.
  • No New Information: Retrospective power provides no additional information beyond what is already contained in the p-value and the confidence interval. Stating that a non-significant result had low power is merely restating the finding that the confidence interval around the point estimate is wide enough to include the null hypothesis.

Why the Flaw is Pervasive

The practice of retrospective power analysis stems from a misunderstanding of the dilemma of the nonrejected null hypothesis: when we fail to reject \(H_0\), we want to know if it’s because the true effect is small (or zero), or because the study lacked power to detect an important effect.

  • Misleading Interpretation: Advocates of retrospective power claim that a low observed power, combined with a non-significant test, suggests the result is “inconclusive” and that a “Type II error” (failing to reject a false \(H_0\)) is likely.
  • The Correct Interpretation: A non-significant result means that the data are consistent with the null hypothesis (\(H_0\) being true). The only way to address the dilemma is by looking at the confidence interval to see if it excludes effect sizes that are considered biologically or clinically important.

Recommendation

The authors recommend that statistical power should be used only for planning an experiment (prospective analysis). To interpret the results of a completed study, especially a non-significant finding, researchers should focus on:

  1. The p-value.
  2. The point estimate (observed effect size).
  3. The confidence interval (which indicates the range of true effects consistent with the data).

The confidence interval is the superior tool for interpreting non-significant results because it shows whether important effect sizes have been reasonably ruled out, which is the actual goal of most post-hoc power discussions.