Synthesizing Subject-matter Expertise for Variable Selection in Causal Effect Estimation: A Case Study
- Objective: To empirically assess different strategies for Directed Acyclic Graph (DAG) creation and the resulting minimal adjustment sets for estimating a known causal effect (adherence on mortality in the CDP trial).
- Key Finding (Efficiency): The results confirm that including nonconfounding prognostic factors (outcome predictors) significantly reduced variance (increased efficiency) of the causal effect estimate, aligning with theoretical statistical advice.
- Recommendation: Researchers should prioritize the exhaustive identification of all potential outcome prognostic factors as the initial and most crucial step when constructing DAGs for causal effect estimation.
PubMed: 38860706 DOI: 10.1097/EDE.0000000000001758 Overview generated by: Gemini 2.5 Flash, 26/11/2025
Background and Purpose
Directed Acyclic Graphs (DAGs) are a crucial theoretical tool for covariate selection in causal effect estimation, as they allow researchers to identify minimal adjustment sets that control for confounding. However, there is limited empirical research on the practical creation of these graphs. This paper assesses different approaches to DAG construction using data from the Coronary Drug Project (CDP) trial. The focus is on estimating the effect of placebo adherence on mortality, a relationship where the true causal effect is assumed to be zero (as a placebo cannot cause mortality), providing a robust benchmark for comparing methods.
Study Methods and Design
The authors created multiple DAGs based on various strategies for identifying and linking variables. For each DAG, the corresponding minimal adjustment sets were derived to control for confounding variables. These adjustment sets were then applied to the CDP data under two primary modeling strategies:
- Baseline-only Adjustment: Estimating the cumulative effect of adherence on mortality by adjusting only for baseline covariate values in a standard regression.
- Time-Varying Adjustment: Estimating the effect by adjusting for time-varying covariates of adherence using Inverse Probability Weighting (IPW).
Empirical Results
Effect of Nonconfounding Prognostic Factors
When estimating the cumulative effect using only baseline covariates, the results showed that the specific choice of covariates had minimal effect on the (expectedly biased) point estimates. However, including nonconfounding prognostic factors (variables that predict the outcome but not the exposure) led to smaller variance estimates. This finding provides empirical support for the theoretical advice that including prognostic factors increases the efficiency of the causal estimate without introducing bias.
Effect of Exposure Predictors
Conversely, when using IPW to adjust for time-varying covariates, adjustment sets that included exposure predictors that were not prognostic factors were shown to result in less bias control.
Performance of DAG Creation Strategies
Overall, the DAGs that were explicitly created by focusing subject-matter expertise on the identification of potential outcome prognostic factors performed best, particularly in the more complex time-varying covariate scenario using IPW.
Conclusions and Recommendations
Confirmation of Theory
The study empirically confirms key theoretical advice regarding causal variable selection:
- Include Prognostic Factors: Identifying and including covariates that are strong predictors of the outcome (prognostic factors) but not predictors of the exposure is highly recommended to reduce variance and increase statistical power.
- Caution with Exposure Predictors: Covariates that are strong predictors of the exposure but not the outcome may interfere with bias control and should be considered with caution.
Practical Recommendation
The paper recommends that researchers and subject-matter experts begin the hand-creation of DAGs with a systematic effort to identify and include all potential outcome prognostic factors, as this strategy proved most effective in constructing a robust adjustment set for causal effect estimation.