Adjusting for Heritable Covariates Can Bias Effect Estimates in Genome-Wide Association Studies
GWAS
statistical genetics
causal inference
collider bias
heritability
- Adjusting a Genome-Wide Association Study (GWAS) for a heritable covariate (a correlated, genetically influenced trait) introduces an unintended collider bias, which distorts SNP effect estimates and can lead to false positive associations.
- The bias is approximately proportional to the product of the genetic effect on the covariate and the phenotypic correlation between the traits, and was empirically confirmed by finding a significant enrichment of SNPs with opposite effects in the WHR adjusted for BMI GWAS (\(p=0.005\)).
- The authors strongly caution against interpreting adjusted results as true direct genetic effects, recommending unadjusted GWAS for total effect discovery and bivariate methods for power gains without inducing collider bias.
PubMed: 25640676
DOI: 10.1016/j.ajhg.2014.12.021
Overview generated by: Gemini 2.5 Flash, 26/11/2025
Key Findings
This seminal methodological report examines the statistical consequences of adjusting Genome-Wide Association Studies (GWAS) for heritable covariates (correlated traits that are themselves genetically influenced), conclusively demonstrating that this common practice introduces a significant collider bias.
Main Discoveries
- Collider Bias Introduction: Adjusting a standard GWAS regression for a heritable covariate (\(C\)) that is correlated with the outcome (\(Y\)) and influenced by the SNP (\(G\)) introduces bias (also known as index event bias or selection bias). This happens because conditioning on the covariate (the collider) opens a spurious path between the SNP and unobserved confounders, which can lead to false positive associations with the primary outcome.
- Unbiased Estimation Condition: The resulting adjusted effect, \(\beta_{G \rightarrow Y \mid C}\), accurately estimates the direct genetic effect on the outcome only under two strict causal models:
- The SNP has no effect on the covariate (\(\beta_{G \rightarrow C} = 0\)).
- The covariate (\(C\)) is a pure mediator, where the correlation between \(C\) and \(Y\) is entirely explained by a direct causal effect of \(C\) on \(Y\).
- Bias Formula: For scenarios involving shared genetic or environmental risk factors, the bias (\(\text{Bias}\)) in the adjusted genetic effect estimate is well-approximated by the equation: \[\text{Bias} \approx -\beta_{G \rightarrow C} \cdot \rho_{C Y} \cdot \sqrt{\frac{\text{Var}(C)}{\text{Var}(Y)}}\] Where \(\beta_{G \rightarrow C}\) is the genetic effect on the covariate, and \(\rho_{C Y}\) is the phenotypic correlation between the covariate and the outcome. The magnitude and direction of the bias are dependent on these terms.
Study Design
The study employed a rigorous statistical approach using established causal inference frameworks, detailed theoretical modeling, and numerical simulations.
Theoretical Framework
- Modeling of the True Direct Effect: The study defined the desired quantity, the direct genetic effect \(\beta_{G \rightarrow Y}\), as the effect of the SNP (\(G\)) on the outcome (\(Y\)) independent of the covariate (\(C\)). This is achieved by adjusting for all causal factors of the covariate.
- Linear Regression Model: The estimation was performed using a standard linear regression: \(Y = \alpha + \beta_{G \rightarrow Y \mid C} G + \gamma C + \epsilon\). The authors derived the expected value of the adjusted coefficient, \(\mathbb{E}[\beta_{G \rightarrow Y \mid C}]\), showing that it equals the true direct effect plus the bias term under various causal scenarios.
- Bias Derivation: The bias was derived by considering the correlation structure induced when the covariate and the outcome share unobserved causes, specifically showing how adjustment for \(C\) introduces conditioning on a collider related to the SNP’s effect. The full mathematical expression for the bias was derived, from which the simplified approximation was obtained.
Simulation Methods
- Scenarios Tested: Simulations covered all three major causal scenarios: C is a mediator of the effect of \(G\) on \(Y\); Y is a mediator of the effect of \(G\) on \(C\); and \(G\) is a shared cause (pleiotropy) where the effect of \(G\) on \(Y\) is mediated by \(C\), or the traits share a hidden common environmental cause (\(U\)).
- Parameters: Simulations varied key parameters, including heritability of the traits (up to \(h^2 = 0.5\)), the phenotypic correlation (\(\rho_{C Y}\), up to \(0.5\)), and the genetic effect sizes (\(\beta_G\)).
- Evaluation Metrics: The primary metrics used to evaluate the consequences of the bias were the Type I error rate (false positive rate) and the statistical power of the adjusted association test. Results showed Type I error inflation when the adjusted model was used in biased scenarios.
Real-World Data Application
- Data Source: GWAS summary statistics from the GIANT consortium meta-analysis of anthropometric traits were used, specifically:
- GWAS of Waist-to-Hip Ratio (WHR) adjusted for BMI (\(Y \mid C\)).
- GWAS of BMI (\(C\)).
- Empirical Test: A test was performed to look for an enrichment of SNPs with marginal effects in opposite directions on WHR and BMI. A highly significant enrichment (\(p=0.005\)) was found, providing empirical evidence that the statistical bias was present and inflating the number of significant loci.
Major Results
- Power Paradox: Adjustment for a heritable covariate results in increased statistical power when the signs of \(\beta_{G \rightarrow Y \mid C}\) and the bias term are in opposite directions. This effect explains the increased detection of specific loci in the WHR adjusted for BMI GWAS.
- Interpretation Challenge: Since the adjusted estimates reflect a combination of the true direct effect and the bias term, they are neither the total genetic effect nor the direct genetic effect in most scenarios involving shared genetic or environmental risk factors.
- Relevance to Ratio Traits: The findings are directly relevant to analyses of ratio traits (e.g., WHR, fasting glucose/insulin ratios), as these are mathematically equivalent to performing a regression adjusted for the denominator.
Practical Implications
Recommendations for Future GWAS
- Prioritize Unadjusted Analysis: For general genetic discovery and estimation of the total genetic effect (which includes effects mediated through other traits), the unadjusted GWAS of the primary outcome is the statistically unbiased standard.
- Use Bivariate Methods: To gain statistical power and accurately account for correlated traits without introducing collider bias, researchers should prefer multivariate or bivariate methods (e.g., those simultaneously modeling both \(C\) and \(Y\)) over simple regression adjustment.
- Causal Inference: If the research goal is specifically to estimate the direct causal effect, more advanced methods like Multivariable Mendelian Randomization (MVMR) should be considered to isolate the effect while mitigating the bias.