Proteomic prediction of disease largely reflects environmental risk exposure
- Core Finding: The high disease predictive value of the plasma proteome primarily reflects its sensitivity as a quantitative readout of environmental risk factors (like smoking and alcohol intake), rather than identifying proteins that are causal drivers of disease.
- Causality Assessment: Using Mendelian Randomization (MR) on thousands of protein-disease associations in the UK Biobank, the study found only 8% showed suggestive evidence of a causal relationship, confirming that causal drivers are rare and disease-specific.
- Environmental Biomarkers: The vast majority of non-causal proteins, particularly those broadly associated with multiple diseases, were found to be exposure-associated. The developed Proteomic Score for Smoking (SmokingPS) achieved an AUC of 0.96, validating the proteome’s role as an objective, quantitative index of lifestyle behaviors.
PubMed: 40909825 DOI: 10.1101/2025.08.27.25334571 Overview generated by: Gemini 2.5 Flash, 28/11/2025
Introduction and Study Goal
Plasma proteomic signatures are highly effective at predicting disease risk, but the mechanisms behind their predictive value—whether they are causal drivers of disease or non-causal predictors reflecting upstream influences—are largely unknown. This study aims to characterize thousands of protein-disease associations to delineate proteins as potential therapeutic targets (drivers) or objective biomarkers of environmental risk factors.
Study Design and Methods
The study utilized blood proteomic data from a subset of the UK Biobank Pharma Proteomics Project (UKB-PPP) (N=45,438) to investigate associations between 2,923 unique plasma proteins and 23 age-related incident disease outcomes.
Partitioning Proteomic Biomarkers
The researchers employed a two-pronged approach to categorize protein-disease associations:
- Causal Assessment (Drivers): They applied two-sample Mendelian Randomization (MR) using cis-pQTLs as genetic instruments to identify which associations represented a potential causal effect of the protein on the disease.
- Environmental Assessment (Predictors): They tested non-causal proteins for associations with major modifiable environmental risk factors, focusing specifically on smoking and alcohol intake due to their large effects on common diseases.
Exposure Quantification Scores
To quantify the sensitivity of the plasma proteome to lifestyle factors, the team developed a Proteomic Score for Smoking (SmokingPS) and an AlcoholPS using LASSO regression models trained on the UKB-PPP cohort. These scores were then tested for their ability to predict disease incidence.
Key Findings: The Dominance of Environmental Signals
1. Causal Drivers are Rare and Disease-Specific
Initial analysis using Cox proportional hazards models identified a large number of associations (9,308 significant pairs involving 2,122 proteins and 22 diseases). However, MR analysis revealed that only a small subset—8%—of the protein-disease associations tested showed suggestive evidence for a causal relationship.
- Specificity: The proteins identified as putatively causal drivers were generally more likely to pertain to only a single disease, suggesting they represent disease-specific biological signals.
- Therapeutic Implication: These few MR-nominated proteins are critical for mechanistic characterization as they constitute promising therapeutic targets.
2. Predictive Value Reflects Environmental Exposure
The vast majority of protein-disease associations were classified as non-causal predictors. The study found that these proteins often broadly associate with incident disease because they are highly perturbed by environmental risk factors, suggesting their predictive value is as an “environmental sensor.”
- Smoking as a Major Driver: The authors discovered that the vast majority (more than 90%) of proteins associated with diseases like lung cancer and COPD are also significantly associated with smoking.
- Quantitative Exposure Readouts: The newly developed proteomic scores demonstrated high accuracy in quantifying environmental factors, achieving an Area Under the Curve (AUC) of 0.96 for smoking and 0.98 for alcohol intake, confirming the plasma proteome’s sensitivity as an objective index of exposure behavior.
Conclusions and Recommendations
The study concludes that the plasma proteome’s ability to predict disease is largely driven by its capacity to serve as a quantitative readout of upstream environmental risk factors rather than reflecting only disease-specific or causal processes. This work has significant implications for precision medicine:
- Interpretation: Researchers must clarify the roles of plasma protein measurements. Putatively causal proteins are candidates for therapeutic targets, while non-causal, exposure-associated proteins serve as valuable biomarkers for monitoring the impacts of lifestyle and environment.
- Disease Prevention: Proteomic assays offer a path toward measuring the effects of the environment on human health using objective, quantitative, and reproducible methods, which can help guide interventions for disease prevention.