Association studies including genotype by environment interactions: prospects and limits
- Abdoul-Aziz Saïdou^{1, 2, 3, 4, 5, 6}Email author,
- Anne-Céline Thuillet^{1},
- Marie Couderc^{1},
- Cédric Mariac^{1, 2} and
- Yves Vigouroux^{1, 6}Email author
DOI: 10.1186/1471-2156-15-3
© Saïdou et al.; licensee BioMed Central Ltd. 2014
Received: 23 April 2013
Accepted: 20 December 2013
Published: 6 January 2014
Abstract
Background
Association mapping studies offer great promise to identify polymorphisms associated with phenotypes and for understanding the genetic basis of quantitative trait variation. To date, almost all association mapping studies based on structured plant populations examined the main effects of genetic factors on the trait but did not deal with interactions between genetic factors and environment. In this paper, we propose a methodological prospect of mixed linear models to analyze genotype by environment interaction effects using association mapping designs. First, we simulated datasets to assess the power of linear mixed models to detect interaction effects. This simulation was based on two association panels composed of 90 inbreds (pearl millet) and 277 inbreds (maize).
Results
Based on the simulation approach, we reported the impact of effect size, environmental variation, allele frequency, trait heritability, and sample size on the power to detect the main effects of genetic loci and diverse effect of interactions implying these loci. Interaction effects specified in the model included SNP by environment interaction, ancestry by environment interaction, SNP by ancestry interaction and three way interactions. The method was finally used on real datasets from field experiments conducted on the two considered panels. We showed two types of interactions effects contributing to genotype by environment interactions in maize: SNP by environment interaction and ancestry by environment interaction. This last interaction suggests differential response at the population level in function of the environment.
Conclusions
Our results suggested the suitability of mixed models for the detection of diverse interaction effects. The need of samples larger than that commonly used in current plant association studies is strongly emphasized to ensure rigorous model selection and powerful interaction assessment. The use of ancestry interaction component brought valuable information complementary to other available approaches.
Keywords
Association study G × E Power simulation Model selection REML PHYC Vgt1Background
Deciphering the genetic basis of quantitative trait variation is a major challenge in biology. Linkage mapping and association mapping are two complementary methods that are widely used to study the relationship between genotype and phenotype. Linkage mapping or family mapping[1] is generally based on the progeny of experimental crosses. Association mapping (or population mapping) benefits from large populations which have inter-crossed for many generations, allowing a high number of recombination events to occur [1]. This historical recombination between loci generally leads to a very fine scale for genotype-phenotype association analysis [2].
One major pitfall of this method is that the genetic background of the populations could produce confounding effects which bias the statistical analysis and inflate the rate of false positives [3]. Methodological solutions have been developed to overcome this bias. First, methods were developed to analyse multi-locus molecular data from mapping samples to infer population structure [4–10] and to infer kinship relationships between individuals [11–13]. Second, satisfactory statistical models were proposed to correct for background effects using genetic relationship matrices [14–16].
Association mapping is especially powerful for common alleles and for moderate to large effects [2, 17]. Association studies will certainly accelerate the study of genotype-phenotype relationships [18]. In particular, genome-wide association studies (GWAS) are promising ways to exhaustively identify polymorphisms linked to the traits of interest [1, 18]. GWAS have already proved to be very useful in plants and have enabled the identification of a number of variants linked to phenotype [19, 20].
The number of association studies performed in plants is currently increasing (see reviews [21–23]). However, most of these studies focused on the analysis of the main effect of molecular polymorphism on the phenotype (i.e. the effect of single factors with no interactions). To date, only a few association mapping studies reported tests of interaction effects (e.g. [24–28]). Interaction effects include genotype by environment interactions (G × E) and epistatic interactions between the genetic factors themselves. Genotype by environment interaction occurs when there is variation among genotypes in the rank order or relative magnitude of effects in different environments [29]. And, epistasis occurs when there is a statistical deviation from the additive combination of two loci in their effects on a phenotype [30, 31].
In a study of Drosophila melanogaster populations, about 50% of phenotypic variation in adult olfactory behaviour was assigned to G × E, highlighting the importance of interactions in the architecture of this trait [32]. In maize Nested Association Mapping populations [33], the proportion of variance explained by G × E or epistatic interactions in flowering traits was low compared to genetic variance due to QTL main effects [34]. But studies in rice [35] and Arabidopsis [36, 37], highlighted the considerable contribution of epistatic effects in determining flowering time for these species. Furthermore, exhaustive association studies revealed that, in general, QTLs explain only a part of trait heritability, even when a large number of loci are considered [34, 38]. In human genetics, this gap is referred to as missing heritability [39]. G × E and epistatic interactions are among the effects that are expected to explain a fraction of this missing heritability [1, 22, 40].
So there is a strong interest for the extension of the plant association mapping framework to deal with interactions effects. The mixed linear model (MLM) framework [16] is the most commonly used in plant association mapping. A methodological examination of the MLM association mapping framework was undertaken to analyze epistatic interactions using population background [30, 41]. A parameterization of mixed model variance structure was recently proposed to handle correlated traits [42]. This study examined pleiotropic effects and two way interactions between gene and environment. In our study, we proposed another strategy to include interaction component in mixed linear model. We proposed to consider interaction between SNP by environment (S × E) as well as to investigate structure by environment interaction (Q × E), SNP by structure interaction (Q × S) and three way interaction between SNP, ancestry and environment (Q × S × E). We think there is actual case where such interactions might be biologically useful and easy to explain. For example, Q × E deals with interaction between structure (represented by ancestry measure) and environment. Such interaction might be difficult to assess on a SNP basis if a trait is associated with numerous genes with small additive effects. However, the effect will be observed in this term Q × E if these numerous genes are associated with the structuration of populations (i.e. associated with adaptation). This term might be particularly useful to investigate for researcher working on ecological adaptation, because it addresses somewhat different phenotypic response in function of the population background (something close to plasticity at the scale of the population). Similarly, the Q × S might be an interaction of a major locus and several minor alleles fixed in the considered population (associated with population structure). These minors allele might not show up on a SNP by SNP analysis. Finally the three way interaction is a combination of the two interactions Q × S × E. Scenarios of interactions involving ancestry have not yet been examined in the context of plant association mapping. Also, previous studies used the variance structure of mixed model (random effects parameterization) [30, 41, 42]. The setting of interactions effects into the mean structure of mixed model (fixed effects) is a common practice. So the assessment of mixed model based on this alternative parameterization is also of interest but this has not been undertaken yet.
The aim of the present study was to investigate the use of a mixed linear model in an association mapping framework to identify genotype by environment interactions and interaction with population structure.
Methods
Linear mixed model specification
Y is the phenotypic trait, μ is the intercept, E is a fixed effect other than population structure effect or SNP effect (e.g. an environmental effect), Q is the fixed effect of population structure, S is the fixed effect of a SNP (or any gene polymorphism). Q is set by matrices of population membership (e.g. ancestry matrices or principal components). K is the polygenic background random effect and e is the random residual of the model. K is set by a matrix of kinship relationship between individuals.
If the environmental variable (E) is set as a random effect, the S × E term has to be set as random too. But, in this study, we consider the case where both S and E are fixed effects, so the interaction could be set as a fixed effect and will contribute to the mean structure. Note that this second model assumes the absence of interaction between population background and environment, so that no term was included for such interaction.
Q × E is the effect of interaction between ancestry and environment; Q × S is the interaction between SNP and ancestry; and Q × S × E is the three way interaction between ancestry, SNP and environment. All interactions are considered as fixed effects. Here, interaction between environment and genetic background is taken into account based on Q, and background effects linked to kinship are included only as random main effects.
Basic scheme for the simulation of association mapping data
where S is a random variable with possible values “1” and “0”, standing respectively for the presence or the absence of the causative allele; σ_{G} is the standard deviation of the original trait across the whole panel; r is the genetic effect ratio (or effect ratio, i.e. a numeric variable that modulates the size of the effect as a function of σ_{G}); and ϵ_{i} is a random variable that adds noise to the trait. The random variable S follows a Bernoulli distribution with P(S = 1) = q, q being the expected frequency of the causative allele. The random variable ϵ follows a normal distribution with a mean of 0 and a variance σ_{ϵ}^{2}. The variance σ_{ϵ}^{2} is set to satisfy a given trait heritability h^{2} = σ_{G}^{2}/ (σ_{G}^{2} + σ_{ϵ}^{2}). Note that the simulated SNP effect is independent of population structure and kinship relationship between individuals.
Hereafter, this basic simulation scheme was extended to generate data patterns with genotype by environment interactions and/or background interactions.
Simulation of a gene by environment interaction
Summary of parameters used in simulation
Symbol | Definition | Range |
---|---|---|
n | Sample size | 90^{a} to 277^{b} |
p_{i} | Original phenotype of individual i | Field-based flowering time score ^{a,b} |
r | Effect ratio, difference of effect associated with a variant | 0 to 1.5 |
σ_{G} | Standard deviation of the phenotype | 6.83^{a}; 8.72^{b} |
λ | Numeric value measuring the change in SNP effect between environments | 0.05 to 2 |
q | Mean SNP allele frequency | 0.05; 0.25; 0.50 |
q_{0} | Mean ancestry | 0.28^{a}; 0.49^{b} |
h^{2} | heritability | 0.25; 0.75 |
Simulation of an interaction between population background, gene and environment
This simulation scheme is therefore basically identical to the previous one (Equation 2), except for the presence of Bk which makes the effect variation depend on ancestry. In the model, Bk was set as a binary numerical variable scoring the presence or absence of the background-dependent marker (1 or 0, respectively). The parameter r varied from 0 to 1.5 (18 values) and λ varied from 0.05 to 2 (9 values); q was set at 0.05, 0.25 and 0.5, respectively; h^{2} was set at 0.25 and 0.75, respectively. For each combination of parameters, a thousand samples were generated for pearl millet panel and a hundred samples for maize, as in equation 2. The size of each individual sample was the same as in the original panel (n = 90 for pearl millet and n = 277 for maize). Among the populations comprising each panel, the population with the highest average ancestry (q_{0} = 0.28 for pearl millet and q_{0} = 0.49 for maize, respectively) was defined as P_{0}, to maximize sample size. We wrote scripts with the software R (version 2.7.2) to implement all the simulation schemes used.
Estimation of model components and assessment of power
The data simulating gene by environment interaction (Equation 2) were fitted using model 2. The model with two and three way interactions (model 3) was used to fit the data including the corresponding interactions (Equation 3). Model components were estimated using the REML method [43]. The fixed effects were significantly tested using the incremental Wald test procedure implemented in the ASReml package [43]. The power of each model was estimated for each parameter combination as the proportion of significant tests (P < 0.05) out of the total number of tests.
The maize and the pearl millet panels
Results of independent field experiments were available for two panels (maize and pearl millet). Maize data came from Cornell University (Ithaca, New York). Pearl millet data came from the Institut de Recherche pour le Développement (IRD, France and Niger).
For pearl millet, the number of days from sowing to the female flowering stage was scored on a panel of 90 inbred lines. Nine field trials were carried out at different sowing dates throughout the rainy season, over a period of four years (2005 to 2008). For this dataset, we use the term trial to refer to the replicate of the field experiment at a given sowing date. All trials were conducted at the same location (Sadoré, Niger). Three of these trials (2005-2006) and the experimental design are described in detail in a previous paper [17]. The six additional trials (2007-2008) were conducted using the same experimental design, to study the trait over the whole sowing period in Niger. In 2007, two sowing dates were considered (June 16 and July 9), and four dates were considered in 2008 (June 26 and July 1, 7, 18). In each trial, approximately 7 to 10 plants per inbred line were measured for each trait to calculate the average phenotype for each inbred line. The averaged inbred flowering score in each trial was used in this study. The set of background markers for this panel (27 SSRs, 306 AFLPs) is described elsewhere [44, 45]. Population structure was previously analyzed [16]. We computed a positive definite kinship matrix using the joint set of AFLP and SSR markers as previously described [17]. For candidate gene loci, we used seven SNPs with minor allele frequency q > 5% at the PHYC locus (sequence described in [17]).
The maize panel contained 277 inbred lines. Flowering phenotype (days to silk) was recorded in eight environments in the United States [34]. Hereafter, we use the term trial to refer to the maize experiment in a given year of experimentation at a given location. In this study, only seven trials were used, due to missing data in the eighth trial. For each inbred, the best linear unbiased predictor (BLUP) from each trial was used. Ancestry estimates were also available from a previous study which clustered maize inbreds into three populations: NonStiffStalk (NS), StiffStalk (SS) and Tropical (TS) populations [46]. Kinship coefficients were based on a set of previously described background markers [46] and calculated using the method implemented in the package EMMA [47]. The genotype score for a MITE insertion at the Vgt1 locus was also available from a previous study [34]. One inbred was dropped from the sample due to missing data. The final sample size used in all parts of this study was thus 276.
Statistical analyses
Set of competing models compared for pearl millet and maize real data
Model | Specification | Number of parameters | |
---|---|---|---|
Maize | Pearl millet | ||
Fit1 | Y = E + Q_{i} + S + (K + e) | 12 | 18 |
Fit2 | Y = E + Q_{i} + S + Q_{i} × E + (K + e) | 24 | 66 |
Fit3 | Y = E + Q_{i} + S + Q_{i} × E + S × E + (K + e) | 30 | 74 |
Fit4 | Y = E + Q_{i} + S + Q_{i} × E + S × E + Q_{i} × S + (K + e) | 32 | 80 |
Fit5 | Y = E + Q_{i} + S + Q_{i} × E + S × E + Q_{i} × S + Q_{i} × S × E + (K + e) | 44 | 128 |
Results
Power of the mixed model with a gene by environment interaction
Power to detect two and three order interactions
Analysis of PHYCeffect in the pearl millet association panel
Effects significance for pearl millet data
Effect | df | Fit1 | Fit5 |
---|---|---|---|
Intercept | 1 | < 10 ^{ -26 } | < 10 ^{ -26 } |
Trial | 8 | < 10 ^{ -26 } | < 10 ^{ -26 } |
Q_{1} | 1 | 0.1954 | 0.1917 |
Q_{2} | 1 | 0.0017 | 0.0016 |
Q_{3} | 1 | 0.1663 | 0.1610 |
Q_{4} | 1 | 0.0244 | 0.0231 |
Q_{5} | 1 | 0.1128 | 0.1103 |
Q_{6} | 1 | 0.0519 | 0.0511 |
PHYC | 1 | 0.0044 | 0.0044 |
Trial × Q_{1} | 8 | - | 0.2942 |
Trial × Q_{2} | 8 | - | 0.3272 |
Trial × Q_{3} | 8 | - | 0.1102 |
Trial × Q_{4} | 8 | - | 0.4017 |
Trial × Q_{5} | 8 | - | 0.6050 |
Trial × Q_{6} | 8 | - | 0.8051 |
Trial × PHYC | 8 | - | 0.0658 |
Q_{1} × PHYC | 1 | - | 0.5669 |
Q_{2} × PHYC | 1 | - | 0.2556 |
Q_{3} × PHYC | 1 | - | 0.1405 |
Q_{4} × PHYC | 1 | - | 0.6343 |
Q_{5} × PHYC | 1 | - | 0.6349 |
Q_{6} × PHYC | 1 | - | 0.5684 |
Trial × Q_{1} × PHYC | 8 | - | 0.1705 |
Trial × Q_{2} × PHYC | 8 | - | 0.3126 |
Trial x Q_{3} × PHYC | 8 | - | 0.8426 |
Trial × Q_{4} × PHYC | 8 | - | 0.0537 |
Trial × Q_{5} × PHYC | 8 | - | 0.0821 |
Trial × Q_{6} × PHYC | 8 | - | 0.6632 |
Analysis of interactions in the maize association panel
The preliminary analysis of maize data using the standard framework (MLM with no interaction effects) produced different results depending on the trial (Additional file 2: Table S4, Additional file 1: Figure S5). The effect of Vgt1 was significant in some environments (P < 0.05), but not in others (Additional file 2: Table S4, Additional file 1: Figure S5). However, the combined p-value obtained from these seven independent trials using Fisher’s method (Additional file 2: Table S4) supported a globally significant effect of Vgt1 (P = 7.52 × 10^{-5}). We set mixed linear models to test for possible interactions (Table 2). We performed a comparative model selection using WLD (Additional file 2: Table S3), AICC and BIC (Additional file 1: Figure S4). These criteria were expected to have a good performance of model choice given the current maize panel size (Additional file 3 and Additional file 1: Figure S7). The Wald test approach and AICC choose the model including a gene by environment interaction and an ancestry by environment interaction (Fit3). BIC choose a closely related interaction model in which the gene by environment interaction was dropped (Fit2).
Effects significance for maize data
Effect | df | Fit2 | Fit3 |
---|---|---|---|
Intercept | 1 | < 10 ^{ -26 } | < 10 ^{ -26 } |
Trial | 6 | < 10 ^{ -26 } | < 10 ^{ -26 } |
Q_{NS} | 1 | 5.86 × 10 ^{ -05 } | 6.27 × 10 ^{ -05 } |
Q_{TS} | 1 | 4.21 × 10 ^{ -04 } | 4.43 × 10 ^{ -04 } |
Vgt1 | 1 | 0.373 | 0.379 |
Trial × Q_{NS} | 6 | 4.16 × 10 ^{ -10 } | 2.52 × 10 ^{ -10 } |
Trial × Q_{TS} | 6 | 1.25 × 10 ^{ -17 } | 5.06 × 10 ^{ -18 } |
Trial × Vgt1 | 6 | - | 2.00 × 10 ^{ -06 } |
Discussion
Impact of data parameters on the power of mixed linear models
Previous association mapping studies in plants investigated the power of mixed linear model mainly in the simple case of main effects [10, 13, 16, 17]. More recently, an assessment of the power of mixed model was reported for some cases of interactions in association mapping [42]. The concerned study highlighted the best efficiency of mixed model for epistasis and interaction detection, compared to other methods [42]. But the reported simulation did not make explicit assessment of the impact of genetic parameters as allele frequency and heritability on the power of the mixed model itself. For instance, only one arbitrary value of heritability was examined in this study for the major locus scenario (h^{2} = 0.95) and, respectively, for the oligogenic scenario (h^{2} = 0.5) [42], making it impossible to appreciate the loss of power due to trait heritability in either scenarios. Also, this simulation did not have explicit control on the effect of allele frequency and effect size.
In the current study, we simulated data based on an explicit control on heritability, allele frequency and effect size, to investigate how these parameters impact the power of mixed model. Diverse patterns of G × E interactions occur in biological data [52]. Our first model of interaction (Equation 2) was relatively simple and provided a learning case of a gene by environment interaction. The last presented scheme (Equation 3) was more complex and allowed the examination of higher order interactions.
While mixed model ensures an improved power for interaction detection compared to different methods [42], our study considered how this power inflates with respect to data parameters. Particularly, our results highlighted the drastic loss of power in association mapping due to low trait heritability and/or low allele frequency, whatever the type of effects considered (main effects and interaction effects). This suggest that the current association mapping framework will perform well only for SNPs with relatively high frequency (roughly q > 0.05) and for traits with relatively high heritability (h^{2} > 0.25). The framework will lack the identification of rare alleles or alleles affecting poorly heritable traits. The loss of power was more critical when higher order interactions were considered. One obvious statistical reason is that the power to detect higher order interaction (Q × S × E, for example) relies not on the allele frequency alone, but rather on the combination of the frequency of all variables involved in the interaction. As allele frequency is a proportion (i.e. a numeric value between 0 and 1), this combination is necessarily lower than any of the original variable frequencies. Furthermore, these simulations suggest that the exhaustive study of trait architecture based on current plant association panels will suffer from the bias of discarding not only rare alleles but also alleles with very small effects, because the full level of power (>0.95) could be reached only with less realistic effect sizes. For instance, the effect size needed to reach 95% of the simulated interactions in the best cases (h^{2} = 0.75 and q = 0.5) corresponded approximately to about 0.4 or 0.5 fold the standard deviation of the trait (Additional file 2: Tables S1 and Table S2).
Finally, we observed that the absolute level of power was improved in the simulations based on maize data compared to pearl millet. Although other properties of these two panels differed (e.g. population structure, trait variance), the improvement in power was in great part due to the effect of the larger sample size in maize (three times larger than the pearl millet sample). But despite this difference in the absolute value of power estimates, it is important to note that the pattern of power variation, as a function of allele frequency, heritability or effect size, was consistent for the two panels. To explore more parameters spaces in terms of genetic background configuration, allele frequency or sample size, the use of simulated panels based on theoretically defined parameters (rather than real panels) will be more suited. But this is not implemented in the current study.
Statistical limits to the analysis of complex interactions using small panels were revealed in when we used very low allele frequency for the smaller dataset (pearl millet data). The number of inbred baring the allele is then too low and the reliability of model components defined on the basis of rare combinations is statistically questionable. Taking all the simulation results together, it appears that no single parameter determines the power of the model and that the effects of all parameters might balance each other. For instance, a good allele frequency does not necessarily imply good power when heritability is insufficient, and reciprocally. In this connection, the power of the MLM framework for a particular set of data needs to be discussed with respect to the combination of data characteristics, instead of focusing on only one or two parameters. This will have implications for the design of association studies, in particular when interactions are of interest.
They are some parameters which could be more easily handled when planning a plant association study. For instance, the size of a panel can be extended with an appropriate sampling effort to reach a sample size favourable for the detection of effects. As revealed by the comparison of the pearl millet panel (n = 90) and maize panel (n = 277), considering a size of hundreds of individuals appear to be the minimum required to ensure a powerful design. However, even in this case, very small phenotypic effects or very low allele frequency also limit the performance of the MLM framework as already noted. To be able to detect rare alleles, the use of family mapping (linkage analysis) could be recommended [1]. Also, the use of original approaches combining linkage and association mapping as developed in maize proved very efficient [33, 34].
Finally, the use of the present framework will be more suited for: 1) association mapping sample of several hundred individuals, 2) traits with high heritability, and 3) relatively common allele.
Confirmation of PHYCeffect into a multitrial mixed model framework
Methods of model selection led to contrasting results for pearl millet (Additional file 1: Figure S4). AICC selected the full model, whereas BIC and WLD selected the simplest model. Given the high difference in the number of parameters between the two selected models, it is unlikely that model uncertainty could conciliate these results, on the contrary of maize results (see below). This contrast is the result of statistical bias in model selection and would be mainly associated to the small size of pearl millet sample. This issue was addressed and examined in this study using a full simulation approach (Additional file 3). In the light of this simulation, we know that unlike for maize (n = 277), the current sample size in pearl millet (n = 90) does not allows a confident use of information criteria. Indeed, the simulation showed the occurrence of contrasting behaviour between methods of model selection when sample size is low. In particular, efficient criteria like AIC and AICC showed a bias toward over-parameterization, while consistent criteria like BIC and CAIC seemed more conservative (Additional file 3). It is also to be noted that, unlike information criteria, the standard procedure of model selection based on Wald test implies multiple testing and raises the problem of type I error control. For example, on pearl millet data, the number of tests needed to compare the 5 defined models was 19 tests per SNP, while the model selection based on BIC needs only 5 comparisons (Additional file 2: Table S3).
We used Wald test to assess the fixed effects considering respectively the full model and the simplest model (Table 3). Whatever the model considered, we confirmed the effect of PHYC polymorphism on the phenotype, and this confirmed a previous study based on the classical MLM framework [16]. This effect is set as main effect and we found no strong evidence of interaction with environment. The slight signal of gene by environment interaction detected with PHYC SNP in position 155 (P = 0.0472, Additional file 2: Table S3) is significant at the nominal threshold, but could not be statistically validated when we correct for multiple testing (for example, Bonferroni threshold would be about 0.005 for a correction for 9 tests). However, the small sample size could possibly explain the absence of significant test even in the presence of real interaction effect, due to lack of power.
Statistical evidence in maize for two types of interactions
The result of model selection in maize data was very close between methods. This good consistency between the methods for this relatively large sample (n = 277) is actually expected in the light of our simulation study (Additional file 3). Two types of biologically important interactions are highlighted by these results. First, we found significant interactions between maize ancestry and environment. This suggests that differences in the background of the three maize populations led to a differential reaction to environment variation. The effect of the interaction between population structure and the environment suggests that these populations respond differently to different environments. The analysis population structure, we actually analyzed an unknown number of loci co-varying with the population structure. These loci will be difficult to identify on a SNP by SNP effect because their effect is absorbed by the structure effect of the mixed model (they co-varied with the structure). However, we showed here that there is interesting genetic feature at the population level that could be retrieved. These interactions could be interpreted as a form of plasticity at the scale of the population.
The second type of interaction was an interaction between one locus (Vgt1) and the environment, with a highly significant probability (P < 10^{-6}). When we performed site by site analysis, this effect was also detected. The effect of the MITE insertion at this locus was in the same direction across all the environments examined, but the size and the significance of this effect was different between trial locations. The two types of interaction effects have important implications for the understanding of quantitative trait variation. More generally, interaction between biological and environmental factors impact trait stability, trait prediction, and trait architecture [53], which underlies the interest of these studies.
The present study is one of the first in plant to deal with genotype by environment interaction in a association mapping framework. It highlight the challenging issue about interaction study based on deep examination of important aspect of this issue (modelling of different types of interaction, model setting and model selection, model power, impact of data parameters like sample size and allele frequency, etc.). Nonetheless, other issues remained to be addressed. For example, it will be interesting to evaluate possible heterogeneous variance structure using mixed model framework.
Conclusion
We used mixed linear model to discuss the statistical perspective of using association mapping panels to study interaction effects in plants. First, this study highlighted the need of large sample size to ensure best model selection and high power. Most of the association panels of plants species to date are composed of a fairly small number of individuals, so larger panels are to be recommended in the future to deal better with G × E interactions. Second, we showed a strong impact of data parameters on the power, with a drastic loss of power when allele frequency, heritability, or effect size, are low. This shortcoming reinforces the need of complementary methods to deal with rare alleles or alleles with small effects. However, we illustrated cases of success of the current mixed model framework by identifying, in the maize association mapping panel, two types of biologically interesting effects of interaction between genetic factors and environment. So the development of interaction studies based on this mixed model framework would contribute understanding quantitative trait variation, notably by taking into account key components of G × E interactions. This framework is complementary to the frameworks of association mapping developed to deal with multi-QTL [54, 55], and QTL by QTL interactions [27, 28].
Declarations
Acknowledgement
This study was supported by a travel grant from the Agropolis Fondation to AAS; by a grant from the National Research Agency of France (ANR-07-JCJC-0116 to YV); and is partly funded by the Agropolis Fondation “Agropolis Resource Center for Crop Adaptation and Diversity” (ARCAD) project. AAS was supported by IRD Ph.D. fellowship and performed part of this research in Cornell University (February-April 2009). We thank Edward S. Buckler, Peter Bradbury, Elhan Ersoz and Zhiwu Zhang for discussion and help during modelling and statistical analyses. We acknowledge Alain Charcosset, Jacques David and Laurène Gay for discussion about the preliminary versions of this paper. We thank Jean-Louis Pham for advice during project design.
Authors’ Affiliations
References
- Myles S, Peiffer J, Brown PJ, Ersoz ES, Zhang Z, Costich DE, Buckler ES: Association mapping: critical considerations shift from genotyping to experimental design. Plant Cell. 2009, 21: 2194-2202. 10.1105/tpc.109.068437.PubMedPubMed CentralView ArticleGoogle Scholar
- Nordborg M, Weigel D: Next-generation genetics in plants. Nature. 2008, 456: 720-723. 10.1038/nature07629.PubMedView ArticleGoogle Scholar
- Pritchard JK, Stephens M, Rosenberg NA, Donnelly P: Association mapping in structured populations. Am J Hum Genet. 2000, 67: 170-181. 10.1086/302959.PubMedPubMed CentralView ArticleGoogle Scholar
- Pritchard JK, Stephens M, Donnelly M: Inference of population structure using multilocus genotype data. Genetics. 2000, 155: 945-959.PubMedPubMed CentralGoogle Scholar
- Gao H, Williamson S, Bustamante CD: An MCMC approach for the joint inference of population structure and inbreeding rate from multi-locus genotype data. Genetics. 2007, 176: 1635-1651. 10.1534/genetics.107.072371.PubMedPubMed CentralView ArticleGoogle Scholar
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genoype data: linked loci and correlated allele frequencies. Genetics. 2003, 164: 1567-1587.PubMedPubMed CentralGoogle Scholar
- Falush D, Stephens M, Pritchard JK: Inference of population structure using multilocus genotype data: dominant markers and null alleles. Mol Ecol Notes. 2007, 7 (4): 574-578. 10.1111/j.1471-8286.2007.01758.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Hubisz MJ, Falush D, Stephens M, Pritchard JK: Inferring weak population structure with the assistance of sample group information. Mol Ecol Resour. 2009, 9: 1322-1332. 10.1111/j.1755-0998.2009.02591.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet. 2006, 2 (12): e190-10.1371/journal.pgen.0020190. doi:10.1371/journal.pgen.0020190PubMedPubMed CentralView ArticleGoogle Scholar
- Zhu C, Yu J: Nonmetric multidimensional scaling corrects for population structure in whole genome association studies. Genetics. 2009, 182: 875-888. 10.1534/genetics.108.098863.PubMedPubMed CentralView ArticleGoogle Scholar
- Loiselle BA, Sork VL, Nason J, Graham C: Spatial genetic structure of a tropical understory shrub, Psychotria officinalis (Rubiaceae). Am J Bot. 1995, 82: 1420-1425. 10.2307/2445869.View ArticleGoogle Scholar
- Hardy OJ, Vekemans X: SPAGeDi: a versatile computer program to analyse spatial genetic structure at the individual or population levels. Mol Ecol Notes. 2002, 2: 618-620. 10.1046/j.1471-8286.2002.00305.x.View ArticleGoogle Scholar
- Stich B, Mohring J, Piepho HP, Heckenberger M, Buckler ES, et al: Comparison of mixed-model approaches for association mapping. Genetics. 2008, 178: 1745-1754. 10.1534/genetics.107.079707.PubMedPubMed CentralView ArticleGoogle Scholar
- Thornsberry JM, Goodman MM, Doebley J, Kresovich S, Nielsen D, et al: Dwarf8 polymorphisms associate with variation in flowering time. Nat Genet. 2001, 28: 286-289. 10.1038/90135.PubMedView ArticleGoogle Scholar
- Zhang Y-M, Mao Y, Xie C, Smith H, Luo L, Xu S: Mapping quantitative trait loci using naturally occurring genetic variance among commercial inbred lines among maize (Zea mays L.). Genetics. 2005, 169: 2267-2275. 10.1534/genetics.104.033217.PubMedPubMed CentralView ArticleGoogle Scholar
- Yu J, Pressoir G, Briggs WH, Bi IV, Yamasaki M, et al: A unified mixed-model method for association studies that accounts for multiple levels of relatedness. Nat Genet. 2006, 38: 203-208. 10.1038/ng1702.PubMedView ArticleGoogle Scholar
- Saïdou AA, Mariac C, Luong V, Pham JL, Bezancon G, Vigouroux Y: Association studies identify natural variation at PHYC linked to flowering time and morphological variation in pearl millet. Genetics. 2009, 182: 899-910. 10.1534/genetics.109.102756.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang Z, Ersoz E, Lai CQ, Todhunter RJ, Tiwari HK, Gore MA, Bradbury PJ, Yu J, Arnett DK, Ordovas JM, Buckler ES: Mixed linear model approach adapted for genome-wide association studies. Nat Genet. 2010, 42: 355-360. 10.1038/ng.546.PubMedPubMed CentralView ArticleGoogle Scholar
- Atwell S, Huang YS, Vilhjalmsson BJ, Willems G, Horton M, et al: Genome-wide association study of 107 phenotypes in Arabidopsis thaliana inbred lines. Nature. 2010, 465: 627-631. 10.1038/nature08800.PubMedPubMed CentralView ArticleGoogle Scholar
- Brachi B, Faure N, Horton M, Flahauw E, Vazquez A, Nordborg M, Bergelson J, Cuguen J, Roux F: Linkage and association mapping of Arabidopsis thaliana flowering time in nature. PLoS Genet. 2010, 6 (5): e1000940-10.1371/journal.pgen.1000940. doi:10.1371/journal.pgen.1000940PubMedPubMed CentralView ArticleGoogle Scholar
- Zhu C, Gore M, Buckler ES, Yu J: Status and prospects of association mapping in plants. Plant Genome. 2008, 1: 5-20. 10.3835/plantgenome2008.02.0089.View ArticleGoogle Scholar
- Hall D, Tegström C, Ingvarsson PK: Using association mapping to dissect the genetic basis of complex traits in plants. Brief Funct Genomics. 2010, 9 (2): 157-165. 10.1093/bfgp/elp048.PubMedView ArticleGoogle Scholar
- Rafalski JA: Association genetics in crop improvement. Curr Opin Plant Biol. 2010, 13: 1-7. 10.1016/j.pbi.2009.12.001.View ArticleGoogle Scholar
- Stracke S, Haseneyer G, Veyrieras JB, Geiger HH, Sauer S, Graner A, Piepho HP: Association studies reveals gene action and interactions in the determination of flowering time in barley. Theor Appl Genet. 2009, 118: 259-273. 10.1007/s00122-008-0896-y.PubMedView ArticleGoogle Scholar
- Zhang N, Gur A, Gibon Y, Sulpice R, Flint-Garcia S, McMullen MD, Stitt M, Buckler ES: Genetic analysis of central carbon metabolism unveils an amino acid substitution that alters maize NAD-dependent isocitrate dehydrogenase activity. PLoS One. 2010, 5 (4): e9991-10.1371/journal.pone.0009991. doi:10.1371/journal.pone.0009991PubMedPubMed CentralView ArticleGoogle Scholar
- Li L, Paulo MJ, van Eeuwijk F: Statistical epistasis between candidate gene alleles for complex tuber traits in an association mapping population of tetraploid potato. Theor Appl Genet. 2010, 121: 1303-1310. 10.1007/s00122-010-1389-3.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang D, Eskridge KM, Crossa J: Identifying QTLs and epistasis in structured plant populations using adaptive mixed LASSO. J Agr Biol Environ Stat. 2010, 16: 170-184.View ArticleGoogle Scholar
- Lü HY, Liu XF, Wei SP, Zhang YM: Epistatic association analysis in homozygous crop cultivars. PLoS One. 2011, 6 (3): e17773-10.1371/journal.pone.0017773.PubMedPubMed CentralView ArticleGoogle Scholar
- Falconer DS, Mackay TFC: Introduction to Quantitative Genetics. 1996, Harlow, UK: Addison-Wesley Longman, 4Google Scholar
- Jannink JL: Identifying quantitative trait locus by genetic background interactions in association studies. Genetics. 2007, 176: 553-561. 10.1534/genetics.106.062992.PubMedPubMed CentralView ArticleGoogle Scholar
- Phillips PC: Epistasis - the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet. 2008, 9: 855-867. 10.1038/nrg2452.PubMedPubMed CentralView ArticleGoogle Scholar
- Sambandan D, Carbone MA, Anholt RRH, Mackay TFC: Phenotypic plasticity and genotype by environment interaction for olfactory behavior in Drosophila melanogaster. Genetics. 2008, 179 (2): 1079-1088. 10.1534/genetics.108.086769.PubMedPubMed CentralView ArticleGoogle Scholar
- McMullen MD, Kresovich S, Villeda HS, Bradbury P, Li H, et al: Genetic properties of the maize nested association mapping population. Science. 2009, 325: 737-740. 10.1126/science.1174320.PubMedView ArticleGoogle Scholar
- Buckler ES, James BH, Bradbury PJ, Acharya CB, Brown PJ, et al: The genetic architecture of maize flowering time. Science. 2009, 325: 714-718. 10.1126/science.1174276.PubMedView ArticleGoogle Scholar
- Uwatoko N, Onishi A, Ikeda Y, Kontani M, Sasaki A, Matsubara K, Itoh Y, Sano Y: Epistasis among the three major flowering time genes in rice: coordinate changes of photoperiod sensitivity, basic vegetative growth and optimum photoperiod. Euphytica. 2008, 163: 167-175. 10.1007/s10681-007-9584-2.View ArticleGoogle Scholar
- El-Lithy M, Bentsink L, Hanhart CJ, Ruys GJ, Rovito D, et al: New Arabidopsis recombinant inbred line populations genotyped using SNPwave and their use for mapping flowering-time quantitative trait loci. Genetics. 2006, 172: 1867-1876.PubMedPubMed CentralView ArticleGoogle Scholar
- Caicedo AL, Stinchcombe JR, Olsen KM, Schmitt J, Purugganan MD: Epistatic interaction between Arabidopsis FRI and FLC flowering time genes generates a latitudinal cline in a life history trait. Proc Natl Acad Sci U S A. 2004, 101: 15670-15675. 10.1073/pnas.0406232101.PubMedPubMed CentralView ArticleGoogle Scholar
- Kover PX, Valdar W, Trakalo J, Scarcelli N, Ehrenreich IM, et al: A multiparent advanced generation inter-cross to fine-map quantitative traits in Arabidopsis thaliana. PLoS Genet. 2009, 5 (7): e1000551-10.1371/journal.pgen.1000551.PubMedPubMed CentralView ArticleGoogle Scholar
- Manolio TA, Collins FS, Cox NJ, et al: Finding the missing heritability of complex diseases. Nature. 2009, 461: 747-753. 10.1038/nature08494.PubMedPubMed CentralView ArticleGoogle Scholar
- Kaprio J: Twins and the mystery of missing heritability: the contribution of gene–environment interactions. J Intern Med. 2012, http://onlinelibrary.wiley.com/doi/10.1111/j.1365-2796.2012.02587.x/pdf,Google Scholar
- Jannink JL: QTL by genetic background interaction: predicting inbred progeny value. Euphytica. 2008, 161: 61-69. 10.1007/s10681-007-9509-0.View ArticleGoogle Scholar
- Korte A, Vilhjálmsson BJ, Segura V, Platt A, Long Q, Nordborg M: A mixed-model approach for genome-wide association studies of correlated traits in structured populations. Nat Genet. 2012, 44 (9): 1066-1071. 10.1038/ng.2376.PubMedPubMed CentralView ArticleGoogle Scholar
- VSN International Ltd: ASReml package for R (ASReml-R), version 2.0/32. 2011, UK: VSN International Ltd, Hemel Hempstead, HP1 1ES, http://www.vsni.co.uk" http://www.vsni.co.ukGoogle Scholar
- Mariac C, Luong V, Kapran I, Mamadou A, Sagnard F, et al: Diversity of wild and cultivated pearl millet accessions (Pennisetum glaucum [L.] R. Br.) in Niger assessed by microsatellite markers. Theor Appl Genet. 2006, 114: 49-58. 10.1007/s00122-006-0409-9.PubMedView ArticleGoogle Scholar
- Allinne C, Mariac C, Vigouroux Y, Bezançon G, Couturon E, et al: Role of seed flow on the pattern and dynamics of pearl millet (Pennisetum glaucum [L.] R. Br.) genetic diversity assessed by AFLP markers: a study in south-western Niger. Genetica. 2008, 133: 167-178. 10.1007/s10709-007-9197-7.PubMedView ArticleGoogle Scholar
- Flint-Garcia SA, Thuillet AC, Yu JM, Pressoir G, Romero SM, et al: Maize association population: a high resolution platform for QTL dissection. Plant J. 2005, 44: 1054-1064. 10.1111/j.1365-313X.2005.02591.x.PubMedView ArticleGoogle Scholar
- Kang HM, Zaitlen NA, Wade CM, Kirby A, Heckerman D, et al: Efficient control of population structure in model organism association mapping. Genetics. 2008, 178: 1709-1723. 10.1534/genetics.107.080101.PubMedPubMed CentralView ArticleGoogle Scholar
- Gilmour AR, Gogel BJ, Cullis BR, Thompson R: ASReml User Guide Release 2.0. 2006, UK: VSN International Ltd, Hemel Hempstead, HP1 1ESGoogle Scholar
- Hurvich CM, Tsai CL: Regression and time series model selection in small samples. Biometrika. 1989, 76: 297-307. 10.1093/biomet/76.2.297.View ArticleGoogle Scholar
- Schwarz G: Estimating the dimension of a model. Annals Statistics. 1978, 6: 461-464. 10.1214/aos/1176344136.View ArticleGoogle Scholar
- Gurka MJ: Extending the Box-Cox transformation to the linear mixed model. J R Statist Soc A. 2006, 169 (2): 273-288. 10.1111/j.1467-985X.2005.00391.x.View ArticleGoogle Scholar
- Mackay TFC, Anholt RRH: Ain’t misbehavin’? Genotype-environment interactions and the genetics of behavior. Trends Genet. 2007, 23: 311-314. 10.1016/j.tig.2007.03.013.PubMedView ArticleGoogle Scholar
- MacKay TFC, Stone E, Ayroles J: The genetics of quantitative traits: Challenges and prospects. Nat Rev Genet. 2009, 10: 565-577. 10.1038/nrg2612.PubMedView ArticleGoogle Scholar
- Hoggart CJ, Whittaker JC, De Iorio M, Balding DJ: Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies. PLoS Genet. 2008, 4 (7): e1000130-10.1371/journal.pgen.1000130.PubMedPubMed CentralView ArticleGoogle Scholar
- Segura V, Vilhjálmsson BJ, Platt A, et al: An efficient multi-locus mixed-model approach for genome-wide association studies in structured populations. Nat Genet. 2012, 44 (7): 825-830. 10.1038/ng.2314.PubMedPubMed CentralView ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.