Multi-Omic Graph Diagnosis (MOGDx): a data integration tool to perform classification tasks for heterogeneous diseases
PubMed: 39177104 DOI: 10.1093/bioinformatics/btae523 Overview generated by: Gemini 2.5 Flash, 27/11/2025
Background and Objective
The complexity and heterogeneity of human diseases (e.g., in cancer or psychiatric disorders) make precise diagnosis and treatment challenging. Multi-omics data offers the opportunity to redefine these diseases at a more granular, molecular level. However, existing integrative machine learning methods often face limitations in scalability, oversimplification of biological relationships, and effectively handling missing data.
This paper introduces Multi-Omic Graph Diagnosis (MOGDx), a flexible data integration tool that leverages Graph Neural Networks (GNNs) to perform robust classification tasks for heterogeneous diseases by capturing complex, non-linear relationships across multiple omics layers.
Methods: The MOGDx Framework
MOGDx is a supervised learning framework that integrates diverse omics data types (like gene expression, DNA methylation) by modeling the study cohort as a graph.
Graph Neural Network Integration
- Nodes and Features: Each patient or sample is represented as a node in the graph. The various omics data (e.g., expression levels) are used as the initial feature vectors for each patient node.
- Edges (Similarity): The edges (connections) between patient nodes are determined by calculating a measure of molecular similarity across all omics data types (similar to Similarity Network Fusion).
- Graph Convolutional Network (GCN): The core of MOGDx is a GCN. This GNN propagates information across the patient-similarity graph, enabling the model to learn complex, non-linear dependencies both within and between the omics layers. This process captures subtle patterns of heterogeneity shared across modalities, which is then used for the classification task (e.g., predicting disease subtype).
Data Robustness
The GNN architecture allows MOGDx to be robust in handling missing data, which is a critical feature for real-world multi-omics cohorts where not all measurements are available for every patient.
Key Results and Findings
MOGDx was applied to several public cancer cohorts (including TCGA data for Glioblastoma, Lung Adenocarcinoma, and Lung Squamous Cell Carcinoma).
- Superior Classification: MOGDx demonstrated superior classification accuracy compared to leading non-GNN multi-omics integration methods and single-omics models, validating the strength of the graph-based approach in learning complex patient relationships.
- Biomarker Identification: The framework facilitates the identification of the specific molecular features (biomarkers) that are most critical in distinguishing the disease classes, by analyzing the feature weights learned in the GNN layers.
- Missing Data Performance: The model maintained high classification performance even when substantial portions of the omics data were missing, confirming its robustness for real-world applications.
Conclusions and Significance
MOGDx offers a significant methodological advancement for multi-omics data integration using a flexible and robust Graph Neural Network approach. By effectively modeling patient relationships and learning complex cross-omics patterns, MOGDx provides a powerful tool for improving the classification accuracy of heterogeneous diseases and accelerating the discovery of precision biomarkers.