PubMed: 41068475 DOI: 10.1038/s44320-025-00158-6 Overview generated by: Gemini 2.5 Flash, 28/11/2025
Key Findings: Deconvoluting the Sources of Plasma Protein Variation
This study used a machine learning (ML) approach to systematically identify and quantify the key factors that determine the variation in thousands of plasma protein levels, aiming to overcome the challenge of limited understanding of protein origins that hampers biomarker translation.
Primary Determinants of Protein Levels
The ML model, which assessed over 1,800 participant and sample characteristics, found that a median of 20 factors (ranging from 1 to 37) jointly explained an average of 19.4% (up to 100.0%) of the variance in approximately 3,000 protein targets.
Crucially, modifiable characteristics (e.g., health metrics, disease status, lifestyle) explained significantly more variance (median: 10.0%) compared to genetic variation (median: 3.9%). This suggests that dynamic, non-genetic factors are the primary drivers of plasma protein differences between individuals.
Segregation and Clustering
Proteins were found to segregate into distinct clusters based on their shared explanatory factors. These clusters revealed proteins primarily driven by: * Human Health and Disease: Indicators of health status and disease. * Pre-analytical Variation: Technical and sample-handling measures, such as accidental activation of platelets.
Ancestry, Sex, and Robustness
The overall explanatory factors were largely consistent across different sexes and ancestral groups. However, the analysis identified specific proteins where the underlying explanatory factors differed by: * Sex: 1,374 proteins. * Ancestry: 74 proteins.
Resource and Application
The study establishes a valuable resource to guide biomarker and drug target discovery, including: 1. Knowledge Graph: An integrated knowledge graph linking the identified explanatory factors with genetic studies and drug characteristics, intended to guide the identification of drug target engagement markers. 2. Biomarker Identification: Demonstrated utility by identifying disease-specific biomarkers, such as matrix metalloproteinase 12 (MMP12) for abdominal aortic aneurysm. 3. Framework: Developed a widely applicable R package and an interactive web portal for researchers to explore all results and integrate the findings into ongoing studies.
Methods
- Cohort: 43,240 participants from the UK Biobank.
- Data: Approximately 3,000 plasma proteins were measured, alongside >1,800 participant and sample characteristics.
- Analysis: Machine learning was used to identify and quantify the variance explained by different factors, with models being consistent across sexes and ancestral groups.