Mapping the proteo-genomic convergence of human diseases
- Objective: To construct a proteo-genomic map by systematically linking genetic risk for hundreds of human diseases and traits to variations in the levels of ~3,000 plasma proteins using data from >54,000 individuals.
- Key Result: Identified 2,228 instances of genetic colocalization across 1,440 proteins and 498 diseases/traits, indicating that a substantial portion of genetic disease risk is mediated by altered protein levels.
- Causal Inference: Through Mendelian Randomization (MR), the study robustly predicted 44 proteins to be causally linked to 37 diseases/traits (e.g., CRP and CAD), thereby providing high-priority candidates for drug targets.
PubMed: 34648354 DOI: 10.1126/science.abj1541 Overview generated by: Gemini 2.5 Flash, 28/11/2025
Key Findings: Linking Genetic Risk to Disease via Plasma Proteins
This study provides a comprehensive “proteo-genomic map” that links genetic risk for hundreds of human diseases and traits to changes in the levels of thousands of circulating plasma proteins. The core innovation is the systematic use of genetic colocalization to identify instances where the same genetic variant influences both a plasma protein level (as a pQTL) and a clinical disease/trait (as a GWAS hit).
The map provides two major insights:
- Proteins Mediate Genetic Risk: The study identified 2,228 instances of genetic colocalization across 1,440 proteins and 498 diseases/traits. This finding suggests that a significant fraction of genetic disease risk acts by altering the level of a specific circulating protein.
- Causal Inference and Drug Targets: By integrating these colocalization events with Mendelian Randomization (MR), the map pinpoints proteins that are likely to be causally related to a disease, making them high-priority candidates for therapeutic drug targeting.
Colocalization and Mendelian Randomization
- Colocalization: The analysis used a Bayesian method to confirm that the genetic signal for a plasma protein (pQTL) and the genetic signal for a disease (GWAS) at a given locus are driven by the same causal variant. This identified 864 proteins with a shared genetic signal with at least one clinical trait.
- Causal Relationships: Integrating this information with MR, the study identified 44 proteins that were robustly predicted to be causally linked to 37 diseases/traits, often validating established biological pathways. For example, C-reactive protein (CRP) was found to be causally associated with increased risk for coronary artery disease and other inflammatory conditions.
Methods and Design
Data Sources
- Proteomics: Measured levels of ~3,000 plasma proteins in over 54,000 individuals from the UK Biobank Pharma Proteomics Project (UKB-PPP) and other cohorts.
- Genetics (pQTLs): Genome-wide association study (GWAS) summary statistics for protein quantitative trait loci (pQTLs).
- Disease/Trait Genetics (GWAS): GWAS summary statistics for 498 diseases and complex traits.
Analytical Framework
- GWAS for Proteins and Traits: Conducted GWAS for all proteins and aggregated existing GWAS results for traits.
- Genetic Colocalization: Performed systematic colocalization analysis between all protein pQTLs and all trait GWAS loci to find shared genetic drivers.
- Causal Inference (MR): Applied Mendelian Randomization to the colocalized pairs to determine the likely causal direction (i.e., whether the protein level causes the disease or vice versa).
- Knowledge Graph: Constructed a “proteo-genomic knowledge graph” to visualize and connect the colocalized and causal protein-disease relationships.
Implications for Biology and Drug Discovery
Convergence of Diseases
The proteo-genomic map revealed convergence points where the genetic signals for multiple diseases colocalized with the same protein. This means that genetic variation affecting a single protein can predispose an individual to several different, often seemingly unrelated, conditions (e.g., genetic variation at the SULT2A1 locus linked to increased SULT2A1 protein activity and a higher risk of gallstones).
Prioritization of Drug Targets
The study’s causal protein-disease links provide a strong basis for prioritizing drug development. If a protein is causally linked to a disease, modulating that protein’s level with a drug is likely to be therapeutically effective. The study validated known drug targets and also highlighted new potential targets based on the strength of the genetic evidence.
Conclusions
The proteo-genomic map serves as a fundamental resource for understanding the molecular mechanisms underlying genetic disease risk. By demonstrating that genetic variation for many diseases converges on a shared set of plasma proteins, the study validates plasma proteomics as a key layer for translational medicine, facilitating drug target discovery and providing a causal foundation for biomarker development.