Causal integration of multi-omics data with prior knowledge to generate mechanistic hypotheses

Multi-omics
Systems Biology
Causal Inference
Signaling Networks
Phosphoproteomics
Transcriptomics
Metabolomics
Bioinformatics
Published

23 January 2026

PubMed: 33502086 DOI: 10.15252/msb.20209703 Overview generated by: Gemini 2.5 Flash, 27/11/2025

Background and Objective

The rise of multi-omics technologies provides vast datasets that capture different layers of molecular information (e.g., phosphorylation, transcription, metabolism). However, methods for integrating these diverse data types to systematically extract mechanistic hypotheses—specifically, how a change in one molecular layer causally affects another—are limited.

This paper introduces COSMOS (Causal Oriented Search of Multi-Omics Space), a novel computational method designed to: 1. Integrate quantitative data from three omics layers: phosphoproteomics, transcriptomics, and metabolomics. 2. Combine this data with extensive prior knowledge contained within signaling, metabolic, and gene regulatory networks. 3. Infer the causal relationships between molecular activities across these layers to generate mechanistic hypotheses.

Methods: The COSMOS Framework

Causal Integration Principle

COSMOS’s core strength lies in its use of a large, curated prior knowledge network (PKN) that encompasses relationships between various molecular entities, including transcription factors, kinases, genes, and metabolites. The method works by performing two key causal steps:

  1. Activity Inference: COSMOS first calculates the estimated activity of key molecular regulators from the measured omics data:
    • Kinase Activity (from phosphoproteomics)
    • Transcription Factor (TF) Activity (from transcriptomics)
    • Enzyme/Pathway Activity (from metabolomics)
  2. Causal Scoring and Hypothesis Generation: The inferred activity changes are then mapped onto the PKN. COSMOS calculates a causal score for every potential regulatory link between a regulator (e.g., a kinase) and its target (e.g., a TF or a metabolite) by evaluating whether the measured change in the regulator’s activity is consistent with the measured change in its target’s activity, considering the known network topology.

Application: Renal Cell Carcinoma (RCC)

COSMOS was applied to multi-omics data from a drug screen on Clear Cell Renal Cell Carcinoma (ccRCC) cells treated with various anti-cancer compounds. This allowed the authors to investigate how drug perturbations causally rewire the signaling, gene regulation, and metabolic networks of cancer cells.

Key Results and Mechanistic Hypotheses

Network Rewiring in ccRCC

COSMOS successfully inferred activity changes in known regulatory pathways, demonstrating the effect of drug perturbations on the signaling, transcriptional, and metabolic machinery of the cancer cells.

Causal Hypotheses Generated

The method generated specific, testable mechanistic hypotheses that connect the omics layers: 1. Signaling to Transcription (Kinase → TF): COSMOS identified the TSSK4 kinase as a causal regulator of the ZHX2 transcription factor following treatment with a CDK inhibitor. This suggested a specific kinase-TF cascade that could be important for therapeutic response. 2. Transcription to Metabolism (TF → Metabolite): The method also revealed a causal link from the ZHX2 TF to the regulation of GAPDH enzyme activity (a key player in glycolysis), which in turn regulated specific metabolites like pyruvate. This closed the loop, linking signaling through transcription to metabolic output. 3. Discovery of Novel Kinases: Through its integration, COSMOS implicated several previously uncharacterized kinases (e.g., CDK11) as potential drivers of the transcriptional response to drug treatment, suggesting novel targets for further investigation.

Conclusions and Significance

COSMOS represents a significant advance in systems biology by offering a robust, transparent, and biologically constrained approach to causal integration of multi-omics data.

By leveraging extensive prior knowledge, it moves beyond simple association to generate specific, directional, and testable mechanistic hypotheses that bridge the gap between different omics layers. This capability is critical for understanding complex diseases like cancer and for accelerating the discovery of novel therapeutic targets and biomarkers.