Genome-wide risk prediction of common diseases across ancestries in one million people

GWAS
ancestry
genetic prediction
health disparities
polygenic risk scores
population genetics
transferability
  • This large-scale study evaluated the cross-ancestry transferability of Polygenic Risk Scores (PRSs) for four common diseases (CAD, T2D, breast, and prostate cancer) using data from six biobanks and over one million individuals of diverse global ancestries.
  • The analysis found that PRS transferability was high and robust across different populations and substructures of European ancestry, but was significantly lower for individuals of African, South Asian, and East Asian ancestry.
  • The poor transferability, which was most pronounced in African ancestry individuals, highlights the critical issue of ancestral bias in genomic research and the potential for current PRS implementation to exacerbate health disparities until more diverse training data are available.
Published

23 January 2026

PubMed: 35591975
DOI: 10.1016/j.xgen.2022.100118
Overview generated by: Gemini 2.5 Flash, 26/11/2025

Key Findings

This study performed a large-scale, cross-ancestry evaluation of Polygenic Risk Scores (PRSs) for four major common diseases—Coronary Artery Disease (CAD), Type 2 Diabetes (T2D), Breast Cancer, and Prostate Cancer—using genome-wide genotype data from six biobanks across Europe, the United States, and Asia, encompassing over one million individuals. The principal finding is a striking disparity in PRS transferability and accuracy: the predictive ability of PRSs remains robust and highly similar across various European populations and local population substructures, suggesting utility in clinical settings for this group. However, the PRSs exhibited significantly poorer transferability and substantially lower effect sizes in individuals of African ancestry, and to a lesser extent, in South Asian and East Asian ancestries. This large-scale empirical evidence underscores the immediate challenge of ancestral bias in genomic data and highlights the potential for the clinical implementation of current PRSs to exacerbate existing health disparities.

Study Design and Data

The study utilized a combined dataset of approximately one million individuals across six major biobanks: BioBank Japan, Estonian Biobank, FinnGen, HUNT, Mass General Brigham (MGB) Biobank, and UK Biobank. The ancestries evaluated included European, South Asian, East Asian, and African.

PRS Calculation and Evaluation

  • PRS Method: Genome-wide PRSs were calculated using LDpred, a method that accounts for linkage disequilibrium (LD) and uses a Bayesian approach to estimate SNP effect sizes, incorporating over 6 million variants for each disease.
  • Input Data: The input weights were obtained from the largest publicly available, non-overlapping Genome-Wide Association Studies (GWASs) for each of the four diseases.
  • Transferability Assessment: Transferability was assessed by comparing the Odds Ratios (OR) per standard deviation (SD) increase in PRS across different global ancestry groups, and also within European populations (including a population isolate, Finland).

Key Results on Transferability

Global Ancestry Disparities

A clear gradient of PRS accuracy was observed, directly correlated with the genetic distance from the primary European GWAS training cohorts:

  • European Ancestry: The PRSs showed consistently high and similar effect sizes (ORs) across various European populations and health-care systems, suggesting good utility for risk stratification.
  • Asian Ancestry: Individuals of South Asian and East Asian ancestry exhibited similar or slightly lower effect sizes compared to Europeans.
  • African Ancestry: Individuals of African ancestry consistently had the lowest effect sizes and poorest prediction accuracy for all four diseases. For instance, in breast cancer, the association was not statistically significant in women of African ancestry in some cohorts.

Substructure and Polygenicity

  • Within-European Transferability: The PRSs transferred well even between highly structured European populations, such as various regional substructures within Finland, demonstrating robustness across recent population bottlenecks.
  • Genome-wide vs. Sparse PRS: A crucial methodological finding was that the highly polygenic, genome-wide PRSs (using millions of variants) displayed higher effect sizes and better transferability across global ancestries than PRSs containing only a smaller, more stringently selected set of variants (sparse PRSs). This supports the notion that the polygenic nature of these traits is captured across different populations, even if the fine-mapping of causal variants differs.

Implications for Clinical Utility

The findings provide strong evidence that the current state of PRS technology is not ready for equitable clinical deployment:

  • Clinical Utility: Current PRSs have demonstrated significant potential for clinical screening and prevention in individuals of European ancestry.
  • Health Equity Concern: The low predictive accuracy in individuals of African ancestry, South Asian, and East Asian ancestry—stemming from the lack of diversity in the original GWAS training data—poses a significant challenge to global health equity and personalized medicine. The study stresses the urgent necessity of investing in and executing large-scale GWAS in non-European populations to address this bias.