metaGE: Investigating genotype x environment interactions through GWAS meta-analysis
- Novel meta-analysis approach for multi-environment trials (METs) that jointly analyzes GWAS summary statistics while accounting for inter-environment correlations
- Controls Type I error effectively (FDR ≤0.05) where competing methods fail severely (METAL FDR >0.84), with computational efficiency enabling analysis of 600K markers × 22 environments in ~2 minutes
- Identified novel competition-responsive flowering QTLs in Arabidopsis and heat-stress yield QTLs in maize through contrast tests and meta-regression with environmental covariates
PubMed: 39792927
DOI: 10.1371/journal.pgen.1011553
Overview generated by: Claude Sonnet 4.5, 25/11/2025
Key Findings
This study introduces metaGE, a meta-analysis approach for detecting quantitative trait loci (QTL) in multi-environment trial (MET) experiments by jointly analyzing summary statistics from individual environment GWAS, addressing the challenge of genotype-by-environment (GxE) interactions in plant genetics.
Main Discoveries
Superior Type I error control: metaGE effectively controls false discovery rate (FDR ≤ 0.05) across simulated scenarios, while competing methods (METAL, mash) show severe inflation (FDR > 0.84)
Computational efficiency: Analyzes large-scale datasets (22 environments × 600K markers) in ~2 minutes, dramatically faster than existing mixed-model approaches
Novel QTL detection: Identified new loci in Arabidopsis (competition-responsive flowering QTLs) and maize (heat-stress responsive yield QTLs) not detected by original single-environment analyses
Flexible testing framework: Enables detection of QTLs with stable effects, environment-specific effects, or effects correlated with environmental covariates
Study Design
Methodological Framework
metaGE jointly analyzes summary statistics (effect sizes and P-values) from per-environment GWAS without requiring raw phenotypic or genotypic data.
Key innovations: - Accounts for inter-environment correlations arising from overlapping genotype panels - Supports both controlled (fixed-effect) and uncontrolled (random-effect) environments - Includes meta-regression to detect QTL-environment covariate relationships
Statistical Models
Z-score transformation: \[Z_{mk} = \Phi^{-1}(0.5 \cdot p_{mk}) \times \text{sign}(\beta_{mk})\]
where β_mk is the marker effect and p_mk is the P-value for marker m in environment k.
Fixed Effect (FE) Model (controlled environments): \[Z_m = X\mu_m + E_m\] \[E_m \sim N(0_K, \Sigma_m)\]
- Environments classified into J groups with stable effects within groups
- Tests for marker association: H₀: {μ_m = 0_J} vs H₁: {∃j, μ^j_m ≠ 0}
- Tests for effect heterogeneity across groups
Random Effect (RE) Model (uncontrolled environments): \[Z_m = \mu_m \mathbf{1}_K + A_m + E_m\] \[A_m \sim N(0_K, \tau_m^2 \Lambda)\]
- Random marker effects account for heterogeneity
- Correlation matrices Σ and Λ estimated from data
Meta-Regression Test: \[H_0: \{\text{cov}(\mu_m \mathbf{1}_K + A_m, X) = 0\} \text{ vs } H_1: \{\text{cov}(\mu_m \mathbf{1}_K + A_m, X) \neq 0\}\]
Test statistic: \(\frac{Z_m^T X}{\sqrt{X^T \Sigma X}} \sim N(0,1)\) under H₀
Correlation Matrix Estimation
Two filtering approaches to identify H₀ markers:
- P-value threshold: Include markers with p_mk > λ in all environments
- Posterior probability (default): Mixture model-based filtering excluding markers with P(H₁) > 0.6
Correlation estimate: \[\hat{\Sigma}_{k,k'} = \text{cor}(Z_k, Z_{k'}) = \frac{\sum_{m \in H_0} (Z_{mk} - \bar{Z}_k)(Z_{mk'} - \bar{Z}_{k'})}{\sqrt{\sum (Z_{mk} - \bar{Z}_k)^2} \sqrt{\sum (Z_{mk'} - \bar{Z}_{k'})^2}}\]
Simulation Study
Design
- Genotypes: 247 maize F1 hybrids, 506,460 SNPs
- Environments: 22 trials
- QTL types:
- Fixed effect (constant across environments)
- Completely random effect
- Random effects correlated with environment similarities
- Covariate-dependent (proportional to Tmax, Tnight, or Psi)
- Simulations: 50 runs per QTL type, 12 QTLs per run
- Heritability: 0.5, QTLs explain 44% of genetic variance
Type I Error Control
FDR on H₀ chromosomes (most stringent):
| Method | QTL Type | FDR_chr |
|---|---|---|
| metaGE_FE | Fixed | 0.00 |
| metaGE_RE | Random | 0.04 |
| metaGE_RE | RandomCov | 0.02 |
| metaGE_MR | Covariate | 0.00-0.02 |
| METAL_FE | Fixed | 1.00 |
| METAL_RE | Random/Cov | 0.88-1.00 |
| mash | All types | 0.14-0.88 |
Whole genome FDR (5 Mb window):
| Method | Range across scenarios |
|---|---|
| metaGE | 0.09-0.18 |
| METAL | 0.93-0.94 |
| mash | 0.32-0.85 |
Detection Power
5 Mb detection window:
| QTL Type | metaGE | METAL | mash |
|---|---|---|---|
| Fixed effect | 0.09 | 0.98 | 0.20 |
| Random effect | 0.51 | 0.16 | 0.79 |
| RandomCov | 0.26 | 0.58 | 0.53 |
| Covariate (avg) | 0.37 | 0.06 | 0.41 |
Despite lower raw power than competitors, metaGE’s proper FDR control makes identified associations reliable.
MAF effect on power (metaGE_FE): - Low MAF [0.20-0.25]: 0.04 - Medium MAF [0.30-0.35]: 0.08 - High MAF [0.40-0.45]: 0.14
Meta-Regression Specificity
Testing covariate-dependent QTLs:
| MR test | Target QTLs detected | Cross-detected |
|---|---|---|
| Tnight | 34.5% (Tnight QTLs) | 5.5% (Tmax QTLs) |
| Tmax | 26.5% (Tmax QTLs) | 7% (Tnight QTLs) |
| Psi | 52% (Psi QTLs) | <1% (others) |
Cross-detection correlated with environmental covariate correlation (r_Tnight-Tmax = 0.71, r_Tnight-Psi = -0.11).
Application I: Arabidopsis Competition Response
Dataset
- 195 accessions, 981,278 SNPs
- 6 controlled micro-habitats (3 soils × competition/no competition)
- Trait: Bolting time
- Competition: Poa annua weed in environments B, D, F
Results
metaGE FE procedure: - 191 SNPs in 61 QTLs identified - 51/61 significant in at least one individual GWAS - Enrichment ratio = 4.13 for candidate flowering genes (q₀.₀₅ = 0.066, q₀.₉₅ = 3.2)
Comparison with METAL: - METAL: >165,000 P-values <0.01 (expected ~10,000 under H₀) - Declared 15% of markers significant (severe inflation)
Contrasted FE test (competition vs. no competition): - 221 SNPs in 72 QTLs with environment-specific effects - 160 candidate genes enriched for: - Development (P = 8.9×10⁻³) - Cell processes (P = 1.5×10⁻³) - Tetrapyrrole synthesis (P = 0.020) - 71/72 QTLs were novel (not detected by standard FE test)
Major Finding: QTL5_22.0
Location: Chromosome 5, AtCNGC4 genomic region - 22 markers with sign-switching effects based on competition - Positive effects without competition, negative/null with competition - AtCNGC4 known roles: - Floral transition regulation - Plant immunity impairment - Consistent with development-defense tradeoffs
Application II: Maize Drought Response
Dataset
- 244 dent maize lines (as hybrids), 602,356 SNPs
- 22 environments (location × year × treatment)
- Trait: Grain yield
- Environmental covariates: Psi, Tmax, Tnight, Rad, VPDmax, ET0, Tnight.Fill
metaGE RE Results
52 genomic regions identified, including:
| QTL | Chr | Local Score | Detection status |
|---|---|---|---|
| QTL3_120.0 | 3 | 38 | Previously reported |
| QTL6_20.3 | 6 | 415 | Previously reported |
| QTL7_41.4 | 7 | 18 | Novel |
QTL6_20.3 analysis: - Strong effects in 6 environments with severe heatwaves: - Night temperature ~22°C - Maximum temperature >36°C - High evaporative demand (3.6 KPa) - All 6 environments: P-values <1×10⁻⁶ - Colocalizes with 2.4 Mb presence/absence variant - Contains ABA-induced genes for water deficit response - Shows selection signatures during domestication/improvement
QTL7_41.4 (novel): - Moderate positive effects across ~10/22 environments - Significant in only 2 individual GWAS (P <0.01 in 10) - Harbors QTLs for plant growth rate and biomass under water deficit - Demonstrates power gain from meta-analysis
Meta-Regression Results
Evapotranspiration (ET0): 14 QTLs detected
Key finding - QTL2_153.8 (marker AX-91538480): - Effects vary linearly from negative to positive with ET0 - Colocalizes with aquaporin eQTLs (PIP2.2, PIP2.1) - Related to water use efficiency and stomatal conductance
Night temperature during flowering (Tnight): 21 QTLs - Main association <0.6 Mb from QTL6_20.3 - Corroborates previous findings on heat stress response
Night temperature during grain filling (Tnight.Fill): 15 QTLs
Example - QTL9_28.6 (marker AX-91123283): - Positive effects on cool nights - Negative effects on hot nights - Dramatic effect reversal with temperature
Application III: Multi-Parent Population
EU-NAM Flint Dataset
- 11 biparental populations (8 analyzed)
- 5,263 SNPs, double haploid lines
- 4 locations: La Coruna, Roggenstein, Einbeck, Ploudaniel
- Trait: Biomass dry matter yield
- 32 analyses (8 populations × 4 locations)
Results
16 QTLs identified, including: - 2 major QTLs also found in original publication (Garin et al.): - QTL1_117.6: Consistent across populations except F2 - QTL6_84.2: Ancestral allele (6 parents) with strong negative effect in TUM
10 novel QTLs, including: - 5 QTLs with effect inversions between populations
Example - QTL5_23.9: - Positive effect in F03802 population - Negative effect in F64 population - Suggests genetic background effects or allelic series
3 QTLs associated with flowering time: - Flowering time is simpler trait and yield driver - Correlation with yield varies by environment (negative/null/positive)
Advantages Over Original Analysis
Original study (Garin et al.): - Limited to 2/4 locations - Analyzed with computationally intensive mixed models
metaGE approach: - Included all 4 locations - Revealed 10 additional QTLs - Completed in 12 seconds vs. hours for mixed models
Application IV: Wheat (Supplementary)
Dataset
- 210 wheat lines, 108,410 SNPs
- 16 environments (location × year × treatment)
- Trait: Grain yield
Key Findings
- All QTLs identified by metaGE RE were not significant in any single environment
- Demonstrates power gain for complex traits with small-effect QTLs
- Highlights importance of joint analysis for yield traits
Computational Performance
Runtime comparison (dataset: marker count):
| Dataset | Environments | metaGE | METAL | mash |
|---|---|---|---|---|
| Simulation (500K) | 22 | 49s (31s*) | 2.6min | 16.6min |
| Arabidopsis (1M) | 6 | 1.2min (26s*) | 2.6min | 29s |
| Maize (600K) | 22 | 2.25min (41s*) | 3.3min | 25.3min |
| EU-NAM (6K) | 32 | 12s (8s*) | 3s | 1.8min |
| Wheat (100K) | 16 | 47s (30s*) | 22s | 3.3min |
*Time for correlation matrix inference (needs to be done only once)
Memory efficiency: - Handles 10⁵-10⁶ markers efficiently - Single correlation matrix estimation per analysis - Independent processing of multiple hypotheses without re-estimation
Methodological Advantages
Over Classical Meta-Analysis (METAL)
Dependency handling: - METAL assumes independence between GWAS - Ignoring dependencies in MET causes severe FDR inflation (>0.84) - metaGE explicitly models inter-environment correlations
Result: METAL unusable for MET analysis due to Type I error inflation
Over Mixture Models (mash)
Environmental factors: - mash models different effect patterns but not environmental influences - Cannot incorporate environmental covariates - Limited ability to test specific biological hypotheses about GxE
Result: mash suitable for pleiotropy but not designed for MET analysis
Over Mixed Models
Scalability: - Mixed models computationally prohibitive for large-scale GWAS - Require raw phenotypic and genotypic data - metaGE: summary statistics only, minutes vs. hours/days
Flexibility: - Easy addition/removal of environments - Handles missing data (monomorphic markers in subpopulations) - Supports unbalanced/incomplete designs without imputation
Comparison to Subgroup Meta-Analysis
Previous work (human genetics): - Subgroup MA and meta-regression developed for independent studies - Not adapted to correlated studies (MET with overlapping panels)
metaGE contribution: - First adaptation of these approaches to non-independent studies - Enables plant genetics applications
Novel Testing Capabilities
1. Standard Association Test
H₀: {μ_m = 0} - marker has no effect in any environment - Detects QTLs with any non-zero effect
2. Heterogeneity Test
H₀: {μ¹_m = μ²_m = … = μᴶ_m} - effects constant across groups - Identifies environment-dependent QTLs
3. Contrast Test
Tests specific hypotheses about effect patterns - Example: Competition vs. no competition in Arabidopsis - Detected 71 new QTLs missed by standard test
4. Meta-Regression
Genome-wide scan for QTL-covariate relationships - Quantifies how QTL effects vary with environmental variables - Identifies adaptive QTLs responding to specific stresses
Biological Insights
Power Gain Through Joint Analysis
Arabidopsis AtCNGC4 region: - Not genome-wide significant in individual environments - Highly significant in joint analysis - Biological relevance confirmed (floral transition, immunity)
Maize QTL7_41.4: - Significant in only 2/22 environments individually - Detected through meta-analysis - Contains known water deficit response QTLs
Wheat QTLs: - None significant in individual environments - Multiple QTLs detected jointly - Critical for complex yield traits
Interpreting Effect Variability
Competition response (Arabidopsis): - Sign-switching effects indicate context-dependent gene function - Development-defense tradeoffs - Identifies condition-specific adaptive alleles
Heat stress response (Maize): - QTL6_20.3 effects clustered in heatwave environments - Presence/absence variant under selection - Adaptive response to temperature stress
Covariate-dependent effects: - Linear relationships between effects and ET0, temperature - Aquaporin-mediated water transport regulation - Plant growth sensitivity to water potential
Data Sharing and Privacy
Advantages of Summary Statistics
Confidentiality: - No raw phenotypic or genotypic data required - Only effect sizes and P-values needed - Enables data sharing between private breeding programs
Parallel to human genetics: - Global Biobank Meta-analysis Initiative (2.2M participants, 24 BioBanks) - Consortium approach without individual data sharing
Plant breeding applications: - Private companies can share GWAS results - Preserve competitive advantages - Collaborative QTL discovery
Technical Benefits
Unbalanced designs: - Different markers tested per environment - Missing data due to monomorphism in subpopulations - No imputation required
Scale flexibility: - Different technologies/sequencing depths - Easy environment addition/removal - Post-hoc quality control
Multi-parent populations: - Different marker sets per family - Handles genetic background effects - Detects allelic series and epistasis
Practical Recommendations
When to Use metaGE
Ideal scenarios: - MET experiments with overlapping genotype panels - Need to control Type I error rate - Testing specific GxE hypotheses - Limited computational resources - Data privacy concerns
Not recommended: - Single environment analysis (use standard GWAS) - Completely independent populations (classical MA sufficient) - Need individual-level covariate adjustments
Model Selection
Fixed Effect (FE) model: - Controlled environments with a priori classification - Testing specific group contrasts - Example: Stress vs. control treatments
Random Effect (RE) model: - Uncontrolled field conditions - Unknown/complex environment relationships - Heterogeneous QTL effects expected
Meta-Regression: - Quantitative environmental covariates available - Hypothesis about specific environmental drivers - Want to identify adaptive QTLs
Multiple Testing Control
Local score approach (default): - Controls FDR while accounting for LD - Accumulates evidence across linked markers - Threshold ξ typically 3-4 - Chromosome-specific significance thresholds
Alternative: Adaptive Benjamini-Hochberg - For low-density markers (e.g., MPP with <10K SNPs) - When LD structure unknown
Implementation Details
R Package: metaGE
Available on CRAN
Key functions: - Fixed effect meta-analysis - Random effect meta-analysis
- Contrast testing - Meta-regression - Local score multiple testing correction
Input requirements: - Per-environment GWAS summary statistics (effects, P-values) - Marker positions - Optional: Environmental covariates
Outputs: - Meta-analysis P-values - Estimated correlation matrices - Significant genomic regions - Effect size estimates per environment/group
Limitations and Considerations
Statistical Assumptions
- Marker independence: Assumes unlinked markers
- Addressed by local score accounting for LD
- Correlation matrix: Assumed common across markers
- Reasonable for inter-environment correlations
- Reduces computational burden
- Normal distribution: Z-scores assumed Gaussian under H₀
- Standard assumption in GWAS
- Violated if P-values not uniformly distributed under null
Design Considerations
Environment classification: - FE model requires a priori grouping - Misclassification reduces power - RE model robust to classification uncertainty
Sample size: - Power increases with more environments - Individual environment sample sizes affect P-value quality - Minimum ~5-10 environments recommended
Covariate correlation: - Meta-regression may detect QTLs correlated with related covariates - Careful interpretation needed with high covariate correlation - Consider testing multiple covariates independently