PWAS: proteome-wide association study-linking genes and phenotypes by functional variation in proteins
- Novel Method: Introduces PWAS (Proteome-Wide Association Study), a protein-centric method that detects gene-phenotype associations by quantifying the cumulative functional damage caused by coding-region variants on the resulting protein product using a machine learning model called FIRM.
- Key Advantage: PWAS is specifically designed to model and detect non-additive heritability, demonstrating its power in identifying associations under the recessive inheritance model, which is often missed by standard GWAS.
- Discovery Power: In analysis using the UK Biobank, PWAS uncovered numerous gene-phenotype associations unique from standard GWAS, including detecting the known colorectal cancer gene MUTYH with high significance under its characteristic recessive mode.
PubMed: 32665031 DOI: 10.1186/s13059-020-02089-x Overview generated by: Gemini 2.5 Flash, 28/11/2025
Key Findings: PWAS as a Protein-Centric Association Method
The authors introduce Proteome-Wide Association Study (PWAS), a novel, protein-centric computational method designed to detect gene-phenotype associations that are mediated by alterations in protein function.
PWAS offers several key advantages over traditional genome-wide association studies (GWAS) and other gene-level methods:
- Aggregation and Power: It aggregates the signal from all coding variants jointly affecting a protein-coding gene, allowing it to uncover associations that are too weak or spread out to be detected by per-variant GWAS, especially those involving rare variants.
- Recessive Inheritance: PWAS is explicitly designed to model both dominant and recessive modes of heritability, which the authors show to be substantial in complex traits, addressing a mode often neglected in traditional GWAS.
- Functional Interpretability: By explicitly quantifying the functional damage to the protein, PWAS provides highly interpretable results and is better posed to suggest a causal relationship between the gene and the phenotype.
Study Design and Methods
PWAS is implemented as a two-stage association pipeline using genetic and phenotypic data from large cohorts, such as the UK Biobank (UKBB).
Stage 1: Quantifying Protein Functional Damage
- Variant Selection: PWAS considers all variants that affect the coding regions of genes (e.g., missense, nonsense, frameshift).
- Damage Prediction: For each variant, a pre-trained machine learning model called FIRM (Functional Interpretation of Rare Missense variants) is used to estimate a variant effect score, which is interpreted as the probability of the protein retaining its function (a score between 0 and 1).
Stage 2: Gene-Level Association Testing
- Score Aggregation: The variant effect scores are combined with individual genotype data to create per-gene functional effect scores for each person in the cohort.
- Inheritance Modeling: Two separate effect scores are calculated for each gene, explicitly covering dominant (at least one damaging hit) and recessive (at least two damaging hits) inheritance models.
- Statistical Test: These per-gene scores are then statistically tested against the phenotype of interest (binary or continuous) alongside covariates (e.g., sex, age, principal components) to identify significant associations.
Results
Performance and Comparison
- Simulation Analysis: Simulations, based on a protein-centric causal model, demonstrated that PWAS’s advantage is particularly pronounced in detecting associations under the recessive inheritance model.
- Real Data Application (UKBB): PWAS was applied to 49 diverse phenotypes using a cohort of 333,424 filtered UKBB samples.
- Comparison to GWAS: The method discovered 2743 gene-phenotype associations that were missed by standard GWAS, which represents 22% of all PWAS-discovered associations.
- Comparison to SKAT: PWAS was found to be complementary to SKAT (Sequentially-adjusted association Test), another popular gene-level association method. PWAS recovered more high-quality, known gene-disease associations from the OMIM database (12 associations compared to 7 for SKAT).
Case Study: Colorectal Cancer
- In a case study of colorectal cancer (2822 cases, 260,127 controls), the well-known predisposition gene MUTYH was not found to be exome-wide significant by standard GWAS (p-values were 6.3E-04 and 1.2E-03).
- In contrast, PWAS detected the MUTYH association with overwhelming significance (FDR q-value = 2.3E-06) by aggregating the signal from multiple variants. Crucially, the association was found to be significant only under the recessive model, which is consistent with the gene’s known biallelic mutation mechanism for increased cancer risk.
Conclusions and Recommendations
PWAS represents a shift toward using detailed, functional machine learning models to improve gene-phenotype association studies. By focusing on protein function and explicitly modeling different inheritance modes, especially recessive effects, it can recover causal protein-coding genes that are typically missed by variant-centric or expression-centric methods. The authors recommend PWAS as an effective, complementary tool for genetic association studies, providing functionally interpretable results without the need for post-analysis fine-mapping.