Disease prediction with multi-omics and biomarkers empowers case-control genetic discoveries in the UK Biobank
- Method: MILTON (Machine Learning with Phenotype Associations), an ensemble machine-learning framework, was developed to integrate multi-omics (including plasma proteomics) and biomarker data from the UK Biobank to predict disease risk.
- Objective: To demonstrate how these biomarker-based predictions can augment genetic association analyses in a phenome-wide context.
- Impact: MILTON outperformed Polygenic Risk Scores (PRSs) in predicting incident disease. Its application in a PheWAS improved signals for 88 known and 14 novel genetic associations, showing its utility in empowering genetic discovery for complex diseases by improving disease classification.
PubMed: 39261665 DOI: 10.1038/s41588-024-01898-1 Overview generated by: Gemini 2.5 Flash, 27/11/2025
Background and Objective
Biobank-level datasets, such as the UK Biobank, offer unprecedented opportunities to discover novel biomarkers and develop powerful predictive algorithms for human disease. The challenge lies in effectively integrating diverse, multi-level data (genomics, proteomics, clinical records) to simultaneously improve disease prediction and the statistical power of genetic discovery.
This study introduces MILTON (Machine Learning with Phenotype Associations), an ensemble machine-learning framework designed to predict a wide range of diseases using multi-omics and clinical biomarkers. The main objective is to demonstrate how these accurate, biomarker-based predictions can augment case-control genetic association studies.
Methods: The MILTON Framework
MILTON is an ensemble machine-learning framework that integrates diverse data types to predict disease status.
- Data Integration: The framework was developed using the UK Biobank, integrating matched plasma proteomics data (from 46,327 samples) and other biomarkers with genetic data (from 484,230 genome-sequenced samples).
- Prediction Task: MILTON was trained to predict 3,213 incident disease cases—cases that were undiagnosed at the time of recruitment—by leveraging the UK Biobank’s longitudinal health record data.
- Augmenting Genetics: The highly accurate disease predictions from MILTON were then used to refine the case and control definitions in a phenome-wide association study (PheWAS). By improving disease classification through biomarker-based prediction, MILTON effectively enhances the statistical power of the genetic association analyses.
Key Results and Significance
MILTON demonstrated substantial efficacy in both prediction and genetic discovery:
- Superior Prediction: MILTON significantly outperformed available polygenic risk scores (PRSs) in predicting incident disease cases, especially for diseases with strong molecular links.
- Empowered Genetic Discovery: When applied to the PheWAS, the framework successfully augmented genetic association analyses. This resulted in improved signals for 88 known genetic associations and led to the discovery of 14 novel genetic associations.
- Targeted Improvement: The framework showed the largest improvement in genetic discovery for diseases characterized by lower PRS prediction accuracy but higher biomarker prediction accuracy.
Conclusion
The MILTON framework provides a powerful and practical approach to leverage deep molecular phenotyping, including multi-omics data, for disease prediction. Crucially, it demonstrates a successful strategy to empower case-control genetic discoveries by refining disease classification, which will accelerate the understanding of the underlying mechanisms of human diseases.