Polygenic scoring accuracy varies across the genetic ancestry continuum

GWAS
genetic ancestry
genetic distance
health disparities
polygenic scores
precision medicine
transferability
  • Core Principle: This landmark study challenged the traditional use of discrete ancestry groups for polygenic scores (PGSs), proposing a framework that evaluates individual-level PGS accuracy based on Genetic Distance (GD) from the GWAS training population.
  • Key Finding: They demonstrated that individual-level PGS accuracy experiences a continuous, steep decay as GD from the training data increases, with an average Pearson correlation of R = -0.95 across 84 complex traits, confirming that performance loss is a predictable function of genetic dissimilarity.
  • Health Equity Implication: The study quantified a major health disparity, showing that the genetically closest individuals of non-European ancestry (e.g., Hispanic/Latino American) had PGS accuracy comparable to the most distant European-ancestry individuals, highlighting the severe and systematic bias due to lack of diversity in training cohorts.
  • Recommendation: The authors advocate for moving beyond discrete ancestry labels and using continuous metrics like GD to characterize and correct for performance disparities, thus ensuring more equitable clinical translation of PGSs.
Published

23 January 2026

PubMed: 37198491 DOI: 10.1038/s41586-023-06079-4 Overview generated by: Gemini 2.5 Flash, 26/11/2025

Key Findings: The Genetic Distance Penalty on PGS Accuracy

This study provides a critical, individual-level assessment of Polygenic Score (PGS) portability, which is necessary for the equitable clinical application of genetic risk prediction. The authors argue that assessing PGS performance using traditional discrete genetic ancestry clusters (e.g., European, African) obscures crucial inter-individual variation and biases estimates. They introduce a framework that evaluates accuracy along a genetic ancestry continuum using a precise metric: Genetic Distance (GD).

Core Discovery: Continuous and Steep Decay of Accuracy

The central finding is the demonstration that PGS accuracy decreases individual-to-individual along the continuum of genetic ancestries in a highly predictable, linear fashion .

  1. Metric Definition: Genetic Distance (GD) is defined as the distance of a target individual’s genotype (e.g., PCA projection) from the population used to train the PGS model. The higher the GD, the more genetically dissimilar the individual is from the training set.
  2. Quantification of Decay: Across a large set of 84 complex traits and diseases, the average individual-level PGS accuracy showed an extremely powerful negative correlation with GD, with a Pearson correlation coefficient of -0.95. This near-perfect correlation highlights that the individual’s genetic background relative to the training population is the primary determinant of score performance.
  3. Ubiquity of Variation: This decreasing trend was observed in all populations considered, including within traditionally labeled ‘homogeneous’ genetic ancestry groups (e.g., European ancestry in UK Biobank). This shows that sub-ancestry variation within a single continent still results in a measurable loss of accuracy based on GD.

Demonstrating Inequity and Systematic Bias

The study leveraged data from the UK Biobank (UKBB, training set, predominantly White British) and the diverse Los Angeles biobank (ATLAS, testing set) to quantify the transferability gap.

  • Intra-European Penalty: When applying UKBB-trained models to individuals of European ancestry in ATLAS, those in the furthest GD decile experienced a significant 14% lower accuracy relative to those in the closest decile.
  • Cross-Ancestry Disparity: The results reveal a severe “distance penalty” for non-European groups. Individuals of Hispanic/Latino American ancestry who are genetically closest (lowest GD decile) to the training data showed similar PGS performance to the European-ancestry individuals who are furthest away (highest GD decile). For the most distant Hispanic/Latino individuals, accuracy was substantially lower, overlapping with that of African American participants.
  • Bias in Risk Estimates: Crucially, GD was found to be significantly correlated with the PGS estimates themselves for 82 of 84 traits. This means the systematic bias due to ancestry distance does not just affect the accuracy (\(R^2\)) but also the magnitude of the predicted risk, potentially leading to widespread miscalibration and inequitable risk stratification.

Conclusion and Call to Action

The authors conclude that relying on aggregate population-level metrics (\(\text{e.g., } R^2\)) obscures this vital individual-level variation and hinders efforts toward health equity. They urge researchers to abandon the use of discrete genetic ancestry clusters in favor of continuous metrics (like GD) to better characterize and address performance disparities, ensuring more reliable and equitable application of PGSs in personalized medicine.