metaGE: Investigating genotype x environment interactions through GWAS meta-analysis

GWAS
meta-analysis
genotype-by-environment
multi-environment trials
plant genetics
QTL mapping
  • Novel meta-analysis approach for multi-environment trials (METs) that jointly analyzes GWAS summary statistics while accounting for inter-environment correlations
  • Controls Type I error effectively (FDR ≤0.05) where competing methods fail severely (METAL FDR >0.84), with computational efficiency enabling analysis of 600K markers × 22 environments in ~2 minutes
  • Identified novel competition-responsive flowering QTLs in Arabidopsis and heat-stress yield QTLs in maize through contrast tests and meta-regression with environmental covariates
Published

23 January 2026

PubMed: 39792927
DOI: 10.1371/journal.pgen.1011553
Overview generated by: Claude Sonnet 4.5, 25/11/2025

Key Findings

This study introduces metaGE, a meta-analysis approach for detecting quantitative trait loci (QTL) in multi-environment trial (MET) experiments by jointly analyzing summary statistics from individual environment GWAS, addressing the challenge of genotype-by-environment (GxE) interactions in plant genetics.

Main Discoveries

  1. Superior Type I error control: metaGE effectively controls false discovery rate (FDR ≤ 0.05) across simulated scenarios, while competing methods (METAL, mash) show severe inflation (FDR > 0.84)

  2. Computational efficiency: Analyzes large-scale datasets (22 environments × 600K markers) in ~2 minutes, dramatically faster than existing mixed-model approaches

  3. Novel QTL detection: Identified new loci in Arabidopsis (competition-responsive flowering QTLs) and maize (heat-stress responsive yield QTLs) not detected by original single-environment analyses

  4. Flexible testing framework: Enables detection of QTLs with stable effects, environment-specific effects, or effects correlated with environmental covariates

Study Design

Methodological Framework

metaGE jointly analyzes summary statistics (effect sizes and P-values) from per-environment GWAS without requiring raw phenotypic or genotypic data.

Key innovations: - Accounts for inter-environment correlations arising from overlapping genotype panels - Supports both controlled (fixed-effect) and uncontrolled (random-effect) environments - Includes meta-regression to detect QTL-environment covariate relationships

Statistical Models

Z-score transformation: \[Z_{mk} = \Phi^{-1}(0.5 \cdot p_{mk}) \times \text{sign}(\beta_{mk})\]

where β_mk is the marker effect and p_mk is the P-value for marker m in environment k.

Fixed Effect (FE) Model (controlled environments): \[Z_m = X\mu_m + E_m\] \[E_m \sim N(0_K, \Sigma_m)\]

  • Environments classified into J groups with stable effects within groups
  • Tests for marker association: H₀: {μ_m = 0_J} vs H₁: {∃j, μ^j_m ≠ 0}
  • Tests for effect heterogeneity across groups

Random Effect (RE) Model (uncontrolled environments): \[Z_m = \mu_m \mathbf{1}_K + A_m + E_m\] \[A_m \sim N(0_K, \tau_m^2 \Lambda)\]

  • Random marker effects account for heterogeneity
  • Correlation matrices Σ and Λ estimated from data

Meta-Regression Test: \[H_0: \{\text{cov}(\mu_m \mathbf{1}_K + A_m, X) = 0\} \text{ vs } H_1: \{\text{cov}(\mu_m \mathbf{1}_K + A_m, X) \neq 0\}\]

Test statistic: \(\frac{Z_m^T X}{\sqrt{X^T \Sigma X}} \sim N(0,1)\) under H₀

Correlation Matrix Estimation

Two filtering approaches to identify H₀ markers:

  1. P-value threshold: Include markers with p_mk > λ in all environments
  2. Posterior probability (default): Mixture model-based filtering excluding markers with P(H₁) > 0.6

Correlation estimate: \[\hat{\Sigma}_{k,k'} = \text{cor}(Z_k, Z_{k'}) = \frac{\sum_{m \in H_0} (Z_{mk} - \bar{Z}_k)(Z_{mk'} - \bar{Z}_{k'})}{\sqrt{\sum (Z_{mk} - \bar{Z}_k)^2} \sqrt{\sum (Z_{mk'} - \bar{Z}_{k'})^2}}\]

Simulation Study

Design

  • Genotypes: 247 maize F1 hybrids, 506,460 SNPs
  • Environments: 22 trials
  • QTL types:
    • Fixed effect (constant across environments)
    • Completely random effect
    • Random effects correlated with environment similarities
    • Covariate-dependent (proportional to Tmax, Tnight, or Psi)
  • Simulations: 50 runs per QTL type, 12 QTLs per run
  • Heritability: 0.5, QTLs explain 44% of genetic variance

Type I Error Control

FDR on H₀ chromosomes (most stringent):

Method QTL Type FDR_chr
metaGE_FE Fixed 0.00
metaGE_RE Random 0.04
metaGE_RE RandomCov 0.02
metaGE_MR Covariate 0.00-0.02
METAL_FE Fixed 1.00
METAL_RE Random/Cov 0.88-1.00
mash All types 0.14-0.88

Whole genome FDR (5 Mb window):

Method Range across scenarios
metaGE 0.09-0.18
METAL 0.93-0.94
mash 0.32-0.85

Detection Power

5 Mb detection window:

QTL Type metaGE METAL mash
Fixed effect 0.09 0.98 0.20
Random effect 0.51 0.16 0.79
RandomCov 0.26 0.58 0.53
Covariate (avg) 0.37 0.06 0.41

Despite lower raw power than competitors, metaGE’s proper FDR control makes identified associations reliable.

MAF effect on power (metaGE_FE): - Low MAF [0.20-0.25]: 0.04 - Medium MAF [0.30-0.35]: 0.08 - High MAF [0.40-0.45]: 0.14

Meta-Regression Specificity

Testing covariate-dependent QTLs:

MR test Target QTLs detected Cross-detected
Tnight 34.5% (Tnight QTLs) 5.5% (Tmax QTLs)
Tmax 26.5% (Tmax QTLs) 7% (Tnight QTLs)
Psi 52% (Psi QTLs) <1% (others)

Cross-detection correlated with environmental covariate correlation (r_Tnight-Tmax = 0.71, r_Tnight-Psi = -0.11).

Application I: Arabidopsis Competition Response

Dataset

  • 195 accessions, 981,278 SNPs
  • 6 controlled micro-habitats (3 soils × competition/no competition)
  • Trait: Bolting time
  • Competition: Poa annua weed in environments B, D, F

Results

metaGE FE procedure: - 191 SNPs in 61 QTLs identified - 51/61 significant in at least one individual GWAS - Enrichment ratio = 4.13 for candidate flowering genes (q₀.₀₅ = 0.066, q₀.₉₅ = 3.2)

Comparison with METAL: - METAL: >165,000 P-values <0.01 (expected ~10,000 under H₀) - Declared 15% of markers significant (severe inflation)

Contrasted FE test (competition vs. no competition): - 221 SNPs in 72 QTLs with environment-specific effects - 160 candidate genes enriched for: - Development (P = 8.9×10⁻³) - Cell processes (P = 1.5×10⁻³) - Tetrapyrrole synthesis (P = 0.020) - 71/72 QTLs were novel (not detected by standard FE test)

Major Finding: QTL5_22.0

Location: Chromosome 5, AtCNGC4 genomic region - 22 markers with sign-switching effects based on competition - Positive effects without competition, negative/null with competition - AtCNGC4 known roles: - Floral transition regulation - Plant immunity impairment - Consistent with development-defense tradeoffs

Application II: Maize Drought Response

Dataset

  • 244 dent maize lines (as hybrids), 602,356 SNPs
  • 22 environments (location × year × treatment)
  • Trait: Grain yield
  • Environmental covariates: Psi, Tmax, Tnight, Rad, VPDmax, ET0, Tnight.Fill

metaGE RE Results

52 genomic regions identified, including:

QTL Chr Local Score Detection status
QTL3_120.0 3 38 Previously reported
QTL6_20.3 6 415 Previously reported
QTL7_41.4 7 18 Novel

QTL6_20.3 analysis: - Strong effects in 6 environments with severe heatwaves: - Night temperature ~22°C - Maximum temperature >36°C - High evaporative demand (3.6 KPa) - All 6 environments: P-values <1×10⁻⁶ - Colocalizes with 2.4 Mb presence/absence variant - Contains ABA-induced genes for water deficit response - Shows selection signatures during domestication/improvement

QTL7_41.4 (novel): - Moderate positive effects across ~10/22 environments - Significant in only 2 individual GWAS (P <0.01 in 10) - Harbors QTLs for plant growth rate and biomass under water deficit - Demonstrates power gain from meta-analysis

Meta-Regression Results

Evapotranspiration (ET0): 14 QTLs detected

Key finding - QTL2_153.8 (marker AX-91538480): - Effects vary linearly from negative to positive with ET0 - Colocalizes with aquaporin eQTLs (PIP2.2, PIP2.1) - Related to water use efficiency and stomatal conductance

Night temperature during flowering (Tnight): 21 QTLs - Main association <0.6 Mb from QTL6_20.3 - Corroborates previous findings on heat stress response

Night temperature during grain filling (Tnight.Fill): 15 QTLs

Example - QTL9_28.6 (marker AX-91123283): - Positive effects on cool nights - Negative effects on hot nights - Dramatic effect reversal with temperature

Application III: Multi-Parent Population

EU-NAM Flint Dataset

  • 11 biparental populations (8 analyzed)
  • 5,263 SNPs, double haploid lines
  • 4 locations: La Coruna, Roggenstein, Einbeck, Ploudaniel
  • Trait: Biomass dry matter yield
  • 32 analyses (8 populations × 4 locations)

Results

16 QTLs identified, including: - 2 major QTLs also found in original publication (Garin et al.): - QTL1_117.6: Consistent across populations except F2 - QTL6_84.2: Ancestral allele (6 parents) with strong negative effect in TUM

10 novel QTLs, including: - 5 QTLs with effect inversions between populations

Example - QTL5_23.9: - Positive effect in F03802 population - Negative effect in F64 population - Suggests genetic background effects or allelic series

3 QTLs associated with flowering time: - Flowering time is simpler trait and yield driver - Correlation with yield varies by environment (negative/null/positive)

Advantages Over Original Analysis

Original study (Garin et al.): - Limited to 2/4 locations - Analyzed with computationally intensive mixed models

metaGE approach: - Included all 4 locations - Revealed 10 additional QTLs - Completed in 12 seconds vs. hours for mixed models

Application IV: Wheat (Supplementary)

Dataset

  • 210 wheat lines, 108,410 SNPs
  • 16 environments (location × year × treatment)
  • Trait: Grain yield

Key Findings

  • All QTLs identified by metaGE RE were not significant in any single environment
  • Demonstrates power gain for complex traits with small-effect QTLs
  • Highlights importance of joint analysis for yield traits

Computational Performance

Runtime comparison (dataset: marker count):

Dataset Environments metaGE METAL mash
Simulation (500K) 22 49s (31s*) 2.6min 16.6min
Arabidopsis (1M) 6 1.2min (26s*) 2.6min 29s
Maize (600K) 22 2.25min (41s*) 3.3min 25.3min
EU-NAM (6K) 32 12s (8s*) 3s 1.8min
Wheat (100K) 16 47s (30s*) 22s 3.3min

*Time for correlation matrix inference (needs to be done only once)

Memory efficiency: - Handles 10⁵-10⁶ markers efficiently - Single correlation matrix estimation per analysis - Independent processing of multiple hypotheses without re-estimation

Methodological Advantages

Over Classical Meta-Analysis (METAL)

Dependency handling: - METAL assumes independence between GWAS - Ignoring dependencies in MET causes severe FDR inflation (>0.84) - metaGE explicitly models inter-environment correlations

Result: METAL unusable for MET analysis due to Type I error inflation

Over Mixture Models (mash)

Environmental factors: - mash models different effect patterns but not environmental influences - Cannot incorporate environmental covariates - Limited ability to test specific biological hypotheses about GxE

Result: mash suitable for pleiotropy but not designed for MET analysis

Over Mixed Models

Scalability: - Mixed models computationally prohibitive for large-scale GWAS - Require raw phenotypic and genotypic data - metaGE: summary statistics only, minutes vs. hours/days

Flexibility: - Easy addition/removal of environments - Handles missing data (monomorphic markers in subpopulations) - Supports unbalanced/incomplete designs without imputation

Comparison to Subgroup Meta-Analysis

Previous work (human genetics): - Subgroup MA and meta-regression developed for independent studies - Not adapted to correlated studies (MET with overlapping panels)

metaGE contribution: - First adaptation of these approaches to non-independent studies - Enables plant genetics applications

Novel Testing Capabilities

1. Standard Association Test

H₀: {μ_m = 0} - marker has no effect in any environment - Detects QTLs with any non-zero effect

2. Heterogeneity Test

H₀: {μ¹_m = μ²_m = … = μᴶ_m} - effects constant across groups - Identifies environment-dependent QTLs

3. Contrast Test

Tests specific hypotheses about effect patterns - Example: Competition vs. no competition in Arabidopsis - Detected 71 new QTLs missed by standard test

4. Meta-Regression

Genome-wide scan for QTL-covariate relationships - Quantifies how QTL effects vary with environmental variables - Identifies adaptive QTLs responding to specific stresses

Biological Insights

Power Gain Through Joint Analysis

Arabidopsis AtCNGC4 region: - Not genome-wide significant in individual environments - Highly significant in joint analysis - Biological relevance confirmed (floral transition, immunity)

Maize QTL7_41.4: - Significant in only 2/22 environments individually - Detected through meta-analysis - Contains known water deficit response QTLs

Wheat QTLs: - None significant in individual environments - Multiple QTLs detected jointly - Critical for complex yield traits

Interpreting Effect Variability

Competition response (Arabidopsis): - Sign-switching effects indicate context-dependent gene function - Development-defense tradeoffs - Identifies condition-specific adaptive alleles

Heat stress response (Maize): - QTL6_20.3 effects clustered in heatwave environments - Presence/absence variant under selection - Adaptive response to temperature stress

Covariate-dependent effects: - Linear relationships between effects and ET0, temperature - Aquaporin-mediated water transport regulation - Plant growth sensitivity to water potential

Data Sharing and Privacy

Advantages of Summary Statistics

Confidentiality: - No raw phenotypic or genotypic data required - Only effect sizes and P-values needed - Enables data sharing between private breeding programs

Parallel to human genetics: - Global Biobank Meta-analysis Initiative (2.2M participants, 24 BioBanks) - Consortium approach without individual data sharing

Plant breeding applications: - Private companies can share GWAS results - Preserve competitive advantages - Collaborative QTL discovery

Technical Benefits

Unbalanced designs: - Different markers tested per environment - Missing data due to monomorphism in subpopulations - No imputation required

Scale flexibility: - Different technologies/sequencing depths - Easy environment addition/removal - Post-hoc quality control

Multi-parent populations: - Different marker sets per family - Handles genetic background effects - Detects allelic series and epistasis

Practical Recommendations

When to Use metaGE

Ideal scenarios: - MET experiments with overlapping genotype panels - Need to control Type I error rate - Testing specific GxE hypotheses - Limited computational resources - Data privacy concerns

Not recommended: - Single environment analysis (use standard GWAS) - Completely independent populations (classical MA sufficient) - Need individual-level covariate adjustments

Model Selection

Fixed Effect (FE) model: - Controlled environments with a priori classification - Testing specific group contrasts - Example: Stress vs. control treatments

Random Effect (RE) model: - Uncontrolled field conditions - Unknown/complex environment relationships - Heterogeneous QTL effects expected

Meta-Regression: - Quantitative environmental covariates available - Hypothesis about specific environmental drivers - Want to identify adaptive QTLs

Multiple Testing Control

Local score approach (default): - Controls FDR while accounting for LD - Accumulates evidence across linked markers - Threshold ξ typically 3-4 - Chromosome-specific significance thresholds

Alternative: Adaptive Benjamini-Hochberg - For low-density markers (e.g., MPP with <10K SNPs) - When LD structure unknown

Implementation Details

R Package: metaGE

Available on CRAN

Key functions: - Fixed effect meta-analysis - Random effect meta-analysis
- Contrast testing - Meta-regression - Local score multiple testing correction

Input requirements: - Per-environment GWAS summary statistics (effects, P-values) - Marker positions - Optional: Environmental covariates

Outputs: - Meta-analysis P-values - Estimated correlation matrices - Significant genomic regions - Effect size estimates per environment/group

Limitations and Considerations

Statistical Assumptions

  1. Marker independence: Assumes unlinked markers
    • Addressed by local score accounting for LD
  2. Correlation matrix: Assumed common across markers
    • Reasonable for inter-environment correlations
    • Reduces computational burden
  3. Normal distribution: Z-scores assumed Gaussian under H₀
    • Standard assumption in GWAS
    • Violated if P-values not uniformly distributed under null

Design Considerations

Environment classification: - FE model requires a priori grouping - Misclassification reduces power - RE model robust to classification uncertainty

Sample size: - Power increases with more environments - Individual environment sample sizes affect P-value quality - Minimum ~5-10 environments recommended

Covariate correlation: - Meta-regression may detect QTLs correlated with related covariates - Careful interpretation needed with high covariate correlation - Consider testing multiple covariates independently