Exploring public cancer gene expression signatures across bulk, single-cell and spatial transcriptomics data with signifinder Bioconductor package
- Problem: Gene expression signatures (GESs) derived from bulk RNA-Seq often show inconsistent performance when applied to single-cell (scRNA-Seq) or spatial transcriptomics (ST) data due to differences in data sparsity and resolution.
- Solution: The study introduces the signifinder Bioconductor package, a workflow that standardizes the testing and scoring of GESs across bulk, scRNA-Seq (using methods like AUCell), and ST platforms.
- Findings: Application to breast cancer signatures revealed that only a small subset were highly robust across all three modalities. The package successfully mapped expression signatures to spatial tissue regions, enabling a more precise, high-resolution analysis of the tumor microenvironment (TME).
PubMed: 39363890 DOI: 10.1093/nargab/lqae138 Overview generated by: Gemini 2.5 Flash, 28/11/2025
Core Problem and Study Goal
The vast and growing collection of published gene expression signatures (GESs)—lists of genes whose expression collectively characterizes a specific biological state, such as a cancer subtype or prognosis—are often derived from bulk RNA-Seq data. Modern transcriptomic studies increasingly rely on high-resolution methods like single-cell RNA-Seq (scRNA-Seq) and Spatial Transcriptomics (ST). The challenge is that these bulk-derived signatures often fail to perform consistently or effectively when directly applied to the noisier, higher-resolution data.
This study introduces the signifinder Bioconductor package, a comprehensive computational workflow designed to allow researchers to effectively investigate the behavior and robustness of known GESs across bulk, single-cell, and spatial transcriptomics data types.
Methods: The signifinder Bioconductor Package
The signifinder package provides a streamlined workflow implemented in R for the re-analysis and comparison of gene expression signatures.
1. Signature Collection and Input
- Input: The package accepts user-defined GESs or a large compendium of over 100 publicly available, manually curated breast cancer (BC) signatures, allowing for standardized testing.
- Data Handling: It handles transcriptomic data from various sources (e.g., TCGA, GEO) and formats, including count and normalized data for bulk, scRNA-Seq, and ST.
2. Signature Scoring and Analysis
The package integrates multiple popular scoring methods, making it flexible for cross-platform comparison: * Bulk Data: Uses standard methods like Z-scores and GSVA (Gene Set Variation Analysis). * Single-Cell/Spatial Data: Integrates specialized methods designed for sparse data, such as AUCell and singscore.
3. Visualization and Interpretation
The workflow provides structured output and visualization tools to interpret the results, including: * Robustness Metrics: Quantification of how consistently a signature identifies specific sample groups across different platforms. * Feature Visualization: Maps of signature scores onto spatial transcriptomics images to visualize the physical location of the cell states predicted by the signature.
Key Findings: Application to Breast Cancer Signatures
The authors applied signifinder to investigate the robustness of 106 published breast cancer GESs across three different data modalities:
1. Robustness Across Modalities
- Consistent Signatures: Only a small subset of the tested signatures showed high robustness across all three data types (bulk, scRNA-Seq, ST).
- High-Resolution Data Value: The analysis revealed that some signatures that performed well in bulk data lost their predictive power in scRNA-Seq or ST, highlighting the need to re-validate bulk-derived signatures in high-resolution contexts.
- Example: Signatures based on Proliferation and Basal/Claudin-low subtypes were among the most robust, consistently classifying cell populations across technologies.
2. Deconvoluting the Tumor Microenvironment (TME)
Using signifinder on spatial transcriptomics data (e.g., from Visium), the authors demonstrated the package’s ability to: * Locate Signatures: Accurately map the spatial enrichment of signatures associated with different cell populations (e.g., immune cells, fibroblasts) and cancer features (e.g., proliferation) within the breast tumor tissue. * TME Characterization: This capability allows researchers to refine the understanding of how the tumor microenvironment influences gene expression patterns, which is often masked in bulk sequencing.
Conclusions and Recommendations
The signifinder Bioconductor package addresses a critical need in computational cancer biology by providing a standardized, reproducible, and flexible tool for assessing the performance of gene expression signatures across emerging high-resolution transcriptomics platforms. The authors recommend that researchers use such methods to validate the relevance and consistency of their signatures when moving between bulk and single-cell/spatial data, ensuring reliable translation of findings.