check the alignment of GWAS to reference panel and flip allele beta where required
alignment_check.RdThis function performs an alignment check for GWAS data by comparing the input
data (df) with a reference dataset (reference). It checks if the alleles
in the data are aligned and flips the alleles when necessary to ensure
consistency with the reference. The function also computes an LD matrix,
performs a kriging-based procedure to adjust the z-scores, and generates a
series of plots to visualize the alignment.
Arguments
- df
A data frame containing the GWAS summary statistics. The following columns should be present:
CHR: Chromosome number (integer).
POS: Position of the SNP (integer).
SNP: SNP identifier (character).
EA: Effect allele (character).
OA: Other allele (character).
EAF: Effect allele frequency (numeric).
BETA: Effect size estimate (numeric).
SE: Standard error of the effect size (numeric).
P: P-value for the association (numeric).
N: Sample size (integer).
phenotype: Phenotype identifier (character).
- reference
A data frame containing the reference data for comparison. The following columns should be present:
Predictor: SNP identifier (character).
A1: Allele 1 (character).
A2: Allele 2 (character).
A1_Mean: Mean value for allele 1 (numeric).
MAF: Minor allele frequency (numeric).
Call_Rate: Call rate (integer).
Info: Information score (integer).
- bfile
file path for reference population (built for using 1kG, e.g., /path/EUR/EUR).
Value
A list containing:
plots: A list of plots, including the alignment plot and observed vs expected z-score plots.
list_df: The final data frame after allele flipping and adjustments.
lambda: A list of lambda estimates for the adjusted z-scores.
Details
The function first merges the GWAS summary statistics with the reference data
based on the SNP identifier and flips the effect sizes if the effect allele
(EA) does not match allele 1 (A1) in the reference. It then computes an LD
matrix using the ieugwasr::ld_matrix_local function, which is used in
subsequent analyses. The function also runs a kriging procedure using
susieR::kriging_rss to adjust the z-scores based on the LD matrix, and
generates plots comparing the observed and expected z-scores before and
after allele flipping. The function will iteratively flip alleles and update
the data until no moreallele flips are needed (based on a log likelihood
ratio test).