Can algorithms replace expert knowledge for causal inference? A case study on novice use of causal discovery
- Objective: To test whether causal discovery algorithms used by novices could replace expert knowledge in selecting covariates for estimating the known null effect of placebo adherence on mortality in the CDP trial.
- Key Finding: Adjustment sets derived from causal discovery algorithms resulted in more residual bias in complex time-varying analyses compared to expert-selected sets.
- Conclusion: Due to high subjectivity in selecting algorithms/parameters and resolving complex graph outputs, the authors do not recommend novice use of causal discovery without an expert’s guidance.
PubMed: 39218433 DOI: 10.1093/aje/kwae338 Overview generated by: Gemini 2.5 Flash, 26/11/2025
Background and Purpose
This paper addresses the increasing discussion within epidemiology regarding the use of causal discovery algorithms (a form of machine learning) to automate the construction of Causal Directed Acyclic Graphs (DAGs) for covariate selection. The study’s objective was to assess the performance of these data-driven methods, when applied by a novice user, against a known confounded effect: the relationship between placebo adherence and mortality in the Coronary Drug Project (CDP) trial. This relationship is widely accepted to have a true causal effect of null, providing a strong benchmark for evaluating the ability of the algorithms to correctly control for confounding.
Study Design and Methods
Causal Discovery Implementation
The authors tested 4 common causal discovery algorithms: Peter-Clark (PC), Fast Causal Inference (FCI), Fast Greedy Causal Search (FGES), and Greedy Relaxed Sparsest Permutation (GRaSP). These algorithms were run on the CDP placebo arm data using 39 baseline covariates and the adherence/mortality outcome.
Parameter Variation
To simulate novice use and assess robustness, the authors varied several inputs: 1. Statistical Thresholds: For test-based algorithms (PC, FCI, GRaSP), the \(\chi^2\) alpha level (\(\alpha\)) was varied from 0.001 to 0.20. 2. Prior Knowledge: Models were run with no prior knowledge, a 3-tier time-ordering (covariates \(\rightarrow\) adherence \(\rightarrow\) death), and a 4-tier time-ordering (age/race \(\rightarrow\) other covariates \(\rightarrow\) adherence \(\rightarrow\) death).
Adjustment Set Selection
From 17 model parameterizations (including 100 bootstrap samples per ensemble), 15 adjustment sets were identified. Because the bootstrapped results often produced cyclic graphs that could not be resolved into minimally sufficient adjustment sets, the authors adopted a simplification strategy: selecting all covariates identified as potential causes of either mortality or adherence in at least one bootstrap sample.
Key Findings: Residual Bias and Subjectivity
Performance
- Baseline-only Adjustment: The adjustment sets identified by the algorithms, when used to adjust for only baseline covariates, performed similarly to prior published results by achieving a roughly 36% reduction of bias from the unadjusted relationship.
- Time-Varying Adjustment (IPW): When more complex methods were used (Inverse Probability Weighting) to adjust for time-varying confounding, the adjustment sets from the causal discovery algorithms resulted in more residual bias compared to the adjustment sets selected by the original CDP expert team.
Challenges and Subjectivity
- Inconsistent Results: Varying the input parameters and algorithm type resulted in a wide range of unique adjustment sets. Only about half of the resulting analyses showed compatibility with the known null effect.
- Expert Knowledge Value: The use of time-ordering (prior knowledge) was essential for improving the interpretability of the resulting graphs, particularly concerning temporal relationships like age and race.
- Methodological Difficulty: The use of bootstrap samples frequently led to cyclic graphs, requiring subjective decisions on simplification and adjustment set selection, which introduces substantial subjectivity into the process.
Conclusions and Recommendations
The study concludes that while causal discovery algorithms can partially replicate expert findings, their use is not straightforward and introduces a significant degree of subjectivity through the selection of algorithms, input parameters, and interpretation of non-acyclic graphs.
The authors strongly recommend that researchers without detailed knowledge of causal discovery algorithms not attempt to use these tools without the aid of an expert in the field. Absent this expert support, the use of traditional subject matter experts to generate causal graphs provides greater transparency about the assumptions made and, in this case study, yielded the best estimate of the true causal effect.