Benchmarking joint multi-omics dimensionality reduction approaches for the study of cancer

benchmarking
bioinformatics
cancer
dimensionality reduction
machine learning
multi-omics
  • Topic: A systematic benchmarking of nine joint Dimensionality Reduction (jDR) methods for integrating multi-omics data, using simulated data, TCGA cancer cohorts, and single-cell data.
  • Key Findings: intNMF excelled in unsupervised clustering tasks, while MCIA (Multiple Co-Inertia Analysis) was identified as the most robust, all-around performer across various prediction and integration tasks.
  • Resource: The study created a reproducible code platform called momix to aid researchers in selecting and applying jDR methods, offering practical guidelines for multi-omics integration.
Published

23 January 2026

PubMed: 33402734 DOI: 10.1038/s41467-020-20430-7 Overview generated by: Gemini 2.5 Flash, 27/11/2025

Background and Objective

High-dimensional multi-omics data is now standard for understanding complex biological systems like cancer. Joint Dimensionality Reduction (jDR) methods are crucial for the effective integration of these heterogeneous datasets. However, the large number of available jDR methods necessitates a systematic evaluation to provide researchers with reliable guidance on which method to choose for their specific research question.

This paper presents a comprehensive benchmark of nine representative joint multi-omics dimensionality reduction approaches to offer practical guidelines for their application, particularly in the study of cancer.

Methods: Systematic Evaluation

The study systematically evaluated nine representative jDR methods (including MCIA, iCluster, and intNMF) across three complementary benchmark scenarios:

  1. Ground-Truth Clustering: Assessing the methods’ ability to retrieve known sample clustering patterns from simulated multi-omics datasets.
  2. Clinical Relevance (TCGA): Using The Cancer Genome Atlas (TCGA) cancer data to evaluate how well the methods’ reduced dimensions predict patient survival, clinical annotations, and enrich for known pathways/biological processes.
  3. Single-Cell Classification: Assessing performance in the classification of multi-omics single-cell data.

The authors also created a reproducible code platform named momix (multi-omics mix), implementing the code developed for this benchmark to support users and future comparative studies.

Key Results and Conclusions

The in-depth comparisons provided clear performance distinctions among the nine methods:

  • Best Clustering Performer: The intNMF (integrated Non-negative Matrix Factorization) method demonstrated the best performance in retrieving ground-truth clustering from simulated data.
  • Best All-Rounder: MCIA (Multiple Co-Inertia Analysis) offered effective and consistent behavior across many different contexts and benchmark criteria, suggesting it is a robust general-purpose tool for integration.
  • Significance: The benchmarking study is a critical resource for the multi-omics community, offering data-driven recommendations for selecting appropriate jDR tools based on the specific research question (e.g., whether the goal is clustering or prediction).