Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets
- Goal: To introduce LDpred-funct, a polygenic prediction method that uses trait-specific functional priors to enhance prediction accuracy.
- Method: LDpred-funct incorporates information from the baseline-LD model (including coding, regulatory, and conserved annotations) to estimate posterior mean causal effect sizes, followed by a cross-validation regularization step to account for genetic sparsity.
- Finding: The method achieved a +4.6% relative improvement in average prediction \(R^2\) across 21 UK Biobank traits compared to the best non-functional method. This demonstrates that leveraging the functional architecture of the genome leads to more accurate Polygenic Risk Scores (PRS).
PubMed: 34663819 DOI: 10.1038/s41467-021-25171-9 Overview generated by: Gemini 2.5 Flash, 28/11/2025
Key Findings: LDpred-funct for PGS
The study introduces and validates LDpred-funct, a new method for polygenic prediction that leverages trait-specific functional priors to increase prediction accuracy.
- Prediction Accuracy Improvement: When applied to 21 highly heritable traits in the UK Biobank, LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (average prediction \(R^2 = 0.144\)) compared to SBayesR, which was identified as the best performing method that does not incorporate functional information.
- Highest Accuracy: LDpred-funct achieved the highest \(R^2\) of 0.413 for height in the UK Biobank. Meta-analyzing training data for height from UK Biobank and 23andMe cohorts (\(N=1107K\)) further increased the prediction \(R^2\) to 0.431.
- Comparison: LDpred-funct was found to have substantially higher prediction accuracy than other comparable functional and non-functional methods, including P+T-funct-LASSO and AnnoPred, in most settings.
Methods: Leveraging Functional Priors
LDpred-funct is built on the principle that genetic variants in functional regions of the genome are enriched for complex trait heritability, and this information can be leveraged to improve Polygenic Risk Scores (PRS).
- Priors Model: The method fits functional priors using the established baseline-LD model. This model includes various annotations such as coding, conserved, regulatory, and LD-related annotations.
- Calculation: It first analytically estimates posterior mean causal effect sizes while accounting for the functional priors and linkage disequilibrium (LD) between variants.
- Regularization: It then uses cross-validation to regularize these causal effect size estimates in bins of different magnitudes, which is specifically designed to improve prediction accuracy for traits with sparse genetic architectures (fewer causal variants).
- Data Sets: The models were tested using 21 highly heritable traits across the UK Biobank (average \(N=373K\) training data) and through meta-analysis of height using data from both UK Biobank and 23andMe cohorts.
Conclusions and Significance
The study concludes that incorporating functional priors into polygenic prediction methods significantly improves accuracy, confirming the importance of the functional architecture of complex traits. The LDpred-funct method offers a powerful and scalable way to integrate this functional information into the prediction of Polygenic Risk Scores.