DIABLO: an integrative approach for identifying key molecular drivers from multi-omics assays

bioinformatics

cancer

data integration

dimension reduction

multi-omics

predictive modeling

systems biology

Method: DIABLO (Data Integration Analysis for Biomarker discovery using Latent variable approaches for Omics datasets) is a supervised multi-block PLS/GCCA method for joint analysis of heterogeneous omics data.
Feature Selection: It uses a sparse penalty (L1) to select a minimal set of key molecular drivers that are highly correlated across omics layers and maximally associated with a specific clinical outcome (e.g., disease status).
Impact: DIABLO demonstrated superior classification accuracy and biological coherence in identifying integrated biomarkers for complex diseases, such as the molecular drivers distinguishing breast cancer subtypes.

Published

23 January 2026

PubMed: 30657866 DOI: 10.1093/bioinformatics/bty1054 Overview generated by: Gemini 2.5 Flash, 27/11/2025

Background and Objective

The challenge in analyzing modern multi-omics data is the high dimensionality of individual datasets and the difficulty in integrating these diverse data types (e.g., transcriptomics, proteomics, metabolomics) while accounting for the shared biological signal across them. Current methods often integrate data post-analysis, neglecting the opportunity to jointly identify features that drive biological differences.

This paper introduces DIABLO (Data Integration Analysis for Biomarker discovery using Latent variable approaches for Omics datasets), a novel computational method designed to: 1. Perform supervised integration of multiple heterogeneous omics datasets. 2. Simultaneously identify a minimal set of key molecular features (biomarkers) that are highly correlated across omics types and maximally associated with a specific outcome variable (e.g., disease status, clinical subtype).

Methods: The DIABLO Framework

Core Algorithm

DIABLO is based on Partial Least Squares (PLS), a method for simultaneous dimension reduction and feature selection. Specifically, it uses a generalized form of PLS called multi-block PLS (MB-PLS) or Generalised Canonical Correlation Analysis (GCCA).

Supervised Integration

Unlike unsupervised integration methods, DIABLO is supervised—it incorporates an outcome variable (e.g., healthy vs. disease) into the model. The method identifies latent components (similar to principal components) that maximize the covariance between: 1. The multi-omics datasets. 2. The multi-omics datasets and the outcome variable.

Feature Selection and Biomarker Discovery

DIABLO employs a sparse penalty (using an \(L_1\) penalty, similar to LASSO regression) within its iterative algorithm. This forces the latent components to be linear combinations of only a small number of features. This feature selection step is critical for: * Dimension Reduction: Reducing the number of irrelevant features. * Biomarker Identification: Pinpointing the most important “key molecular drivers” that explain the variation in the biological outcome across the different omics layers.

Key Results and Application

Performance

The authors demonstrated that DIABLO outperformed several state-of-the-art multi-omics integration and classification methods (e.g., iClusterPlus, multi-kernel learning) in terms of: * Classification Accuracy: Achieving higher predictive performance for distinguishing between sample groups. * Biological Relevance: Identifying smaller, more biologically coherent subsets of features that were consistently selected across different omics blocks.

Case Study: Breast Cancer

DIABLO was applied to a multi-omics breast cancer dataset (transcriptomics, metabolomics, miRNA) to distinguish between clinical subtypes. * Integrated Signatures: DIABLO identified a signature of features that were highly correlated across the omics layers, including specific genes (mRNA), microRNAs, and metabolites. * Key Drivers: The method pinpointed known and novel molecular drivers (e.g., genes and pathways related to cell cycle and proliferation) whose concerted changes across the different omics layers were responsible for the differences between cancer subtypes.

Conclusions and Significance

DIABLO is a powerful, flexible, and scalable tool for supervised integration of multi-omics data. Its ability to simultaneously perform dimension reduction, feature selection, and association with a clinical outcome makes it particularly well-suited for biomarker discovery.

By identifying small, highly relevant, and correlated sets of features across different molecular layers, DIABLO provides valuable insights into the key molecular drivers underlying complex biological states or disease phenotypes, aiding in the transition toward personalized medicine.