Causal modelling of gene effects from regulators to programs to traits

statistical genetics
causal inference
perturbation assays
functional genomics
transcriptomics
GWAS
  • Framework: This paper proposes a novel statistical approach to infer causal mechanistic pathways that link genes to traits by combining gene-trait effect sizes (from GWAS LoF burden tests) with gene-regulatory relationships (from Perturb-seq experiments).
  • Causal Hierarchy: The model establishes a three-step causal graph: Gene \(\longrightarrow\) Regulatory Programs \(\longrightarrow\) Trait, allowing researchers to explain gene-trait associations via intermediate functional steps.
  • Proof of Concept: Applied to three blood traits (RDW, MCH, IRF) using human HSPC Perturb-seq data, the model successfully identified the regulatory programs (e.g., ribosomal genes) and directionally predicted how gene perturbations causally influence the traits.
Published

23 January 2026

PubMed: 41372418 DOI: 10.1038/s41586-025-09866-3 Overview generated by: Gemini 2.5 Flash, 10/12/2025

Research Goal and Causal Framework

This paper addresses a critical gap in human genetics: the lack of genome-scale approaches to infer causal mechanistic pathways linking genes to cellular functions and ultimately to complex human traits. The authors propose a novel framework that bridges this gap by combining quantitative genetic association data with regulatory information from cellular perturbation experiments.

The Proposed Causal Model

The core of the study is the construction of a causal graph that models the directional associations of genes with a trait. The model posits a three-step causal hierarchy:

\[ \text{Gene} \longrightarrow \text{Regulatory Programs} \longrightarrow \text{Trait} \]

The approach uses two main data inputs: 1. Gene-Trait Relationships (\(\gamma\)): Quantitative estimates of gene effects on traits, derived from Loss-of-Function (LoF) burden tests in large-scale genetic studies (GWAS). 2. Gene-Regulatory Relationships (\(\beta\)): Quantitative estimates of a gene’s regulatory effects on cellular programs, inferred from high-throughput cellular perturbation assays, such as Perturb-seq experiments in relevant cell types.

The model combines these two forms of data using a statistical framework to infer the causal path, allowing gene-trait associations to be explained by regulatory effects on intermediate biological programs or by direct effects on the trait.

Proof of Concept: Blood Trait Analysis

The authors applied this causal framework as a proof of concept to jointly model three partially co-regulated blood traits: Red Blood Cell Distribution Width (RDW), Mean Corpuscular Hemoglobin (MCH), and Immature Reticulocyte Fraction (IRF).

Methods and Data Integration

  1. Perturb-seq Data: Used pooled Perturb-seq experiments (CRISPR-based perturbation combined with single-cell RNA sequencing) in human hematopoietic stem and progenitor cells (HSPCs), which are the trait-relevant cell type, to generate a comprehensive map of gene-regulatory connections (\(\beta\)).
  2. GWAS Data: Used publicly available GWAS data to calculate gene-level effect sizes (\(\gamma\)) for LoF burden on the three blood traits.
  3. Causal Graph Construction: The combined data were used to construct a causal graph of the gene-regulatory hierarchy underlying the traits.

Key Results

  • Identifying Regulatory Programs: The model successfully identified regulatory programs (groups of co-regulated genes) that serve as intermediate causal steps. For instance, the model identified a program involving ribosomal genes that causally links gene perturbations to RDW and MCH.
  • Directional Causality: The framework was able to determine the directional relationships, showing how the perturbation of a regulatory gene affects a specific program, which in turn affects the trait.
  • Novel Gene-Trait Links: The model provided functional validation for known associations and suggested new mechanisms. For example, the regulatory effect of \(GATA1\) perturbation on multiple programs was successfully linked to observed effects on all three blood traits.
  • Cross-Trait Comparisons: The model was used to predict cross-trait relationships of gene effects, showing how a gene’s regulatory effect can lead to correlated or inversely correlated outcomes across different traits.

Conclusions and Future Directions

The study demonstrates a novel and powerful strategy for leveraging cellular perturbation screens in conjunction with human genetic data to move beyond simple association and systematically infer the mechanistic chain of causality from genes to biological programs to complex traits.

The approach is scalable and suggests that performing Perturb-seq experiments in additional trait-relevant cell types, coupled with robust gene-level effect size estimation from GWAS, represents a critical future direction for illuminating the biological mechanisms underlying the results of large-scale genetic association studies.