The cost of dichotomising continuous variables

biostatistics
continuous variables
dichotomization
regression analysis
statistical power
type i error
  • Avoidable Costs: The practice of dichotomizing continuous variables for statistical analysis leads to a substantial loss of statistical power and an inflation of the Type I error rate (false positives) in multivariable models.
  • Flawed Interpretation: Dichotomization uses arbitrary or unstable cut-points, provides misleading effect estimates, and ignores the true shape of the relationship across the continuous scale.
  • Recommendation: Continuous variables should be analyzed on their original scale (or using methods like fractional polynomials/splines) to preserve information and avoid these serious statistical drawbacks.
Published

23 January 2026

PubMed: 16675816 DOI: 10.1136/bmj.332.7549.1080 Overview generated by: Gemini 2.5 Flash, 26/11/2025

Key Finding: Dichotomization of Continuous Variables is Statistically Detrimental

This article critiques the common practice in clinical research of converting continuous variables (e.g., blood pressure, weight, cholesterol) into binary categories (dichotomization, e.g., “hypertensive” or “not hypertensive”). The authors argue that while this practice is useful for clinical decision-making and data presentation, it is unnecessary for statistical analysis and introduces several serious, avoidable drawbacks.

Study Design and Methods

This paper is a Statistics Note (a commentary/review) that uses theoretical arguments and a review of existing statistical literature to demonstrate the flaws of dichotomization.

The core argument is based on quantifying the statistical and inferential costs associated with replacing precise continuous data with a simple binary indicator (0 or 1). The analysis focuses on the detrimental impact on statistical power and Type I error control in subsequent analyses, particularly in regression modeling.

Results and Major Drawbacks

The authors identify four main statistical costs associated with dichotomization:

1. Substantial Loss of Statistical Power

The primary statistical consequence is a reduction in power to detect a true association or effect. By converting a continuous measure into a binary one, researchers discard a significant amount of information (i.e., the magnitude of a value relative to others). This loss of information is equivalent to conducting a study with a much smaller sample size, making it harder to achieve statistical significance for a genuine effect.

2. Inflation of the Type I Error Rate

When a confounding variable is dichotomized in a multivariable model (e.g., logistic regression), it often fails to adequately control for the confounding effect across the full range of the variable. This residual confounding can lead to a substantial inflation of the Type I error rate for other variables in the model, increasing the risk of false-positive findings.

3. Arbitrary and Unstable Cut-points

The choice of the cut-point used to divide the continuous variable is frequently arbitrary and lacks strong biological justification. Furthermore, attempts to find an “optimal” cut-point based on the observed data are statistically dangerous, as this practice can lead to biased effect estimates (overestimation) and a loss of validity when generalizing results to new data.

4. Misleading Effect Estimates

Dichotomization assumes a simple step-function relationship between the variable and the outcome. This ignores the detailed variation within each group and can entirely misrepresent a true biological relationship, especially if that relationship is non-linear across the continuous scale. The resulting effect estimate only represents the difference between the mean values of the two resulting groups, masking the true dose-response curve.

Conclusion and Recommendations

The authors conclude that dichotomization sacrifices statistical rigor for an unnecessary simplicity in the analysis stage.

They strongly recommend that continuous variables should be analyzed on their original continuous scale in statistical models (e.g., as continuous predictors in regression models). If the relationship is suspected to be non-linear, more appropriate methods such as fractional polynomials or splines should be used instead of categorization.