- Methodology article
- Open Access
Multiple trait multiple interval mapping of quantitative trait loci from inbred line crosses
- Luciano Da Costa E Silva^{1},
- Shengchu Wang^{1} and
- Zhao-Bang Zeng^{2}Email author
https://doi.org/10.1186/1471-2156-13-67
© Da Costa E Silva et al.; licensee BioMed Central Ltd. 2012
- Received: 2 March 2012
- Accepted: 28 June 2012
- Published: 1 August 2012
Abstract
Background
Although many experiments have measurements on multiple traits, most studies performed the analysis of mapping of quantitative trait loci (QTL) for each trait separately using single trait analysis. Single trait analysis does not take advantage of possible genetic and environmental correlations between traits. In this paper, we propose a novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL for inbred line crosses. We also develop a novel score-based method for estimating genome-wide significance level of putative QTL effects suitable for the MTMIM model. The MTMIM method is implemented in the freely available and widely used Windows QTL Cartographer software.
Results
Throughout the paper, we provide compelling empirical evidences that: (1) the score-based threshold maintains proper type I error rate and tends to keep false discovery rate within an acceptable level; (2) the MTMIM method can deliver better parameter estimates and power than single trait multiple interval mapping method; (3) an analysis of Drosophila dataset illustrates how the MTMIM method can better extract information from datasets with measurements in multiple traits.
Conclusions
The MTMIM method represents a convenient statistical framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariance matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices, therefore, providing more details on the genetic architecture of complex traits.
Keywords
- Genetic architecture
- Genotypic variance-covariance
- Pleiotropy
- Power
- QTL by environment interaction
- Score statistics
- Statistical genetics
Background
Many traits that are important to agriculture, human health and evolutionary biology are quantitative in nature, influenced by multiple genes. Efficient and robust identification and mapping onto genomic positions of those genes is a very important goal in quantitative genetics. The availability of genome-wide molecular markers provides the means for us to locate and map those quantitative trait loci (QTL) in a systematic way. Since the publication of interval mapping method for QTL genome-wide scan[1], many statistical methods have been proposed and developed to map multiple QTL with or without epistasis in single trait in a variety of populations[2], e.g. composite interval mapping (CIM)[3, 4], least squares[5, 6], multiple interval mapping (MIM)[7], and Bayesian interval mapping[8, 9].
Although single trait QTL mapping methods have been applied in many studies to estimate the genetic basis and architecture of complex traits, these methods did not utilize the information of genetic and environmental correlations between traits, and are not ideal for data analysis. Multiple trait analysis however can take these into account and also can formally test a number of hypotheses concerning the nature of genetic correlations, such as pleiotropy vs. close linkage and genotype by environment interaction[10]. Moreover, multiple trait analysis can allow the estimation of genetic variance-covariance matrix between traits and its decomposition in terms of individual QTL ([11, 12] pages 109-110).
Multiple trait CIM[10], least squares[13] and Bayesian[14, 15] methods have been available for multiple trait QTL analysis. However, these methods have not been targeted to multiple QTL for multiple traits, i.e. the whole QTL that contribute to the genetic variances and covariances. Also these methods lack appropriate criteria for assessing genome-wide significance level of QTL effects. The multiple trait CIM method uses a genome-wide threshold based on either asymptotic approximation of the log-likelihood ratio test (LRT) or permutation[16]. Nevertheless, when applied to multiple QTL models, the permutation test has some limitations in testing some targeted hypotheses. In this study, we have invested efforts in developing: (1) a statistical method for multiple trait multiple interval mapping (MTMIM) of QTL from inbred line crosses, and (2) a score-based threshold for assessing significance level of QTL that is suitable for MTMIM.
In what follows, we motivate MTMIM modeling from a practical point of view, describe the MTMIM statistical model, build the likelihood function, derive parameter estimators, extend the score-based threshold method[17] to the MTMIM model, propose a forward selection strategy to build an MTMIM model using the score-based threshold as a criterion to assess the significance level of QTL effects, and propose a model optimization procedure to fine tune a fitted MTMIM model. We then frame the hypothesis testing of pleiotropic versus closely linked non-pleiotropic QTL, and QTL by environment interaction via the MTMIM model. Next, we implement the MTMIM model and score-based threshold method and evaluate them with several simulated datasets. More specifically, we evaluate type I error, model fitting, and the efficiency of the test of pleiotropic versus closely linked nonpleiotropic QTL delivered by the MTMIM model. Lastly, we demonstrate the usefulness of the MTMIM model by analyzing data from an experiment with fruit flies Drosophila and draw our final considerations.
We organize this paper in a manner such that a reader less interested in the mathematical aspect of the modeling could skip the analytical derivations while being able to understand the main points regarding multiple trait multiple interval mapping of QTL.
A motivating example
We use data from a cross between fruit flies Drosophila simulans and D. mauritiana to motivate MTMIM modeling. Detailed information about the experiment can be found in[18, 19]. Briefly, males from an inbred line of D. mauritiana (Rob A JJ) were crossed to females from an inbred line of D. simulans (13w JJ) to produce F_{1} hybrids. F_{1} females were then crossed to each parental species to produce two backcross populations of males, mauritiana backcross (BM) and simulans backcross (BS). These two crosses were repeated one more time to produce two independent populations from each backcross: BS1 (sample size n=186), BS2 (n=288), BM1 (n=192) and BM2 (n=299). Males from BM1 and BS1 were scored at 45 marker loci for which the two parental lines were homozygous for different alleles. Males from BM2 and BS2 were scored at 42 marker loci out of the same 45 marker loci that BM1 and BS1 were scored. The phenotypic values of each subject are: (1) average over both sides (left and right) of the first principal component of 100 Fourier coefficients of posterior lobe (PC1); (2) area of the posterior lobe (AREA); (3) average over both sides of the first principal component of 100 Fourier coefficients of the rescaled posterior lobe, rescaled so that it has unit area (ADJPC1); and (4) length of the foreleg tibia (TIBIA). While PC1 provides a measure of both size and shape of the posterior lobe, AREA and ADJPC1, on the other hand, provide measures of size and shape somewhat separately. TIBIA provides a measure of overall body size. The genotypic and phenotypic data are freely available at[20].
All variables related to the posterior lobe (PC1, ADJPC1 and AREA) were reported to be highly correlated between themselves in both BM1 and BS1, correlation larger than 0.82[18]. Therefore, suggesting the presence of pleiotropic and/or closely linked QTL affecting size and shape. However, all variables related to the posterior lobe were weakly correlated with TIBIA. Because posterior lobe shape and size possibly share most of their developmental process components, these two traits could be tightly related mostly due to pleiotropic effects[18]. Results of composite interval mapping analysis of AREA, PC1, and ADJPC1 were very similar to each other, except for the presence of a QTL affecting both AREA and PC1 but not ADJPC1 in the interval between marker loci Ddc and eve. Therefore, this QTL affects size but not shape of the posterior lobe[18]. In this article, we use only PC1 and ADJPC1 traits and the BM1 and BM2 samples. AREA was not analyzed because it is highly correlated (0.99) with PC1[18], and TIBIA was not analyzed because according to Liu and coauthors[18] it has small correlation with AREA and in general TIBIA is not an important factor governing the variability of posterior lobe shape. Besides, on our single trait analysis no QTL was found for TIBIA. BS1 and BS2 samples were not used for analysis because the main goal of this article is to present a novel method for QTL mapping, rather than to investigate details of the inheritance of posterior lobe shape.
Positions of mapped QTL in regions 4, 5, 7, 10, 11, 13, 14, 16 and 17 (Figure1) did not coincide in the MIM models of PC1 and ADJPC1. Therefore, one could hypothesize the existence of two closely linked nonpleiotropic QTL at each of these regions. In order to test the hypotheses of pleiotropic versus closely linked nonpleiotropic QTL at each one of these regions, a joint analysis of PC1 and ADJPC1 is needed. The joint analysis also allows us to partition the genotypic variance-covariance matrix between traits PC1 and ADJPC1 in terms of QTL-specific variance-covariance matrices. Thus in this motivating example, the main reasons to use the MTMIM model are: (1) to test pleiotropic versus closely linked nonpleiotropic QTL, and (2) to estimate the contribution of each QTL to the total genotypic variance-covariance matrix between traits PC1 and ADJPC1. A third reason for the MTMIM modeling, though not applicable to this specific motivating data, is the possibility to test the hypothesis of QTL by environment interaction[10].
Results and discussion
Type I error
Model size (results not shown)
The number of QTL in the MTMIM model of scenario SI was much closer to the simulated parameter (five QTL) when compared to scenario SII, for any genome-wide significance level. While a QTL in both scenarios has to exceed very similar thresholds to be declared significant in the forward selection, the number of traits affected by a QTL is rather different between the two scenarios. In scenario SI all QTL have effect on all traits, while in scenario SII a QTL may have effect either on one, two or three traits. Therefore, model overparametrization makes the detection of QTL with effects on one and two traits in scenario SII more difficult. Lastly, our results show that in general the number of mapped QTL is closer to the simulated (five QTL) in the MTMIM than in the MIM model.
FDR
Estimates of false discovery rate
Analysis | SI | SII | SIII | |||||||
---|---|---|---|---|---|---|---|---|---|---|
(trait) | LOD- d | 1% | 5% | 10% | 1% | 5% | 10% | 1% | 5% | 10% |
MIM | 1.0 | 9.1 | 9.1 | 9.9 | 8.9 | 9.2 | 10.0 | 7.2 | 7.9 | 8.7 |
(T1) | 1.5 | 3.9 | 4.4 | 5.3 | 3.7 | 4.3 | 5.3 | 2.8 | 3.5 | 4.1 |
2.0 | 2.0 | 2.7 | 3.6 | 1.8 | 2.2 | 3.0 | 1.4 | 1.9 | 2.3 | |
MIM | 1.0 | 8.0 | 8.7 | 8.9 | 7.9 | 8.6 | 9.6 | 6.2 | 7.0 | 7.8 |
(T2) | 1.5 | 3.9 | 4.2 | 4.7 | 3.2 | 4.1 | 5.4 | 3.1 | 3.7 | 4.5 |
2.0 | 2.0 | 2.3 | 3.0 | 1.2 | 2.2 | 3.6 | 1.2 | 2.1 | 2.8 | |
MIM | 1.0 | 10.7 | 9.6 | 9.9 | 12.4 | 13.8 | 18.0 | – | – | – |
(T3) | 1.5 | 3.8 | 4.2 | 4.9 | 7.5 | 9.0 | 11.4 | – | – | – |
2.0 | 1.8 | 2.3 | 3.1 | 4.8 | 6.5 | 8.5 | – | – | – | |
MTMIM | 1.0 | 4.6 | 5.4 | 6.9 | 8.5 | 9.2 | 10.0 | 5.6 | 7.8 | 8.4 |
1.5 | 1.9 | 2.7 | 4.0 | 3.3 | 4.1 | 4.9 | 2.9 | 5.2 | 5.7 | |
2.0 | 1.1 | 1.9 | 3.3 | 1.4 | 2.4 | 3.2 | 2.2 | 4.1 | 4.5 |
Power
Power of QTL identification
Analysis | SI | SII | SIII | |||||||
---|---|---|---|---|---|---|---|---|---|---|
(trait) | QTL | 1% | 5% | 10% | 1% | 5% | 10% | 1% | 5% | 10% |
Q1 | 66.8 | 82.0 | 86.6 | 65.8 | 80.2 | 84.2 | 67.6 | 77.2 | 79.6 | |
MIM | Q2 | 63.6 | 81.8 | 87.6 | 59.8 | 78.2 | 81.8 | – | – | – |
(T1) | Q3 | 67.4 | 81.6 | 87.2 | 63.2 | 81.2 | 85.8 | 75.2 | 87.0 | 90.2 |
Q4 | 66.4 | 81.8 | 87.0 | 63.4 | 78.4 | 83.4 | – | – | – | |
Q5 | 66.8 | 83.6 | 86.4 | 65.6 | 82.0 | 87.2 | 70.2 | 78.4 | 81.6 | |
Q1 | 64.8 | 80.0 | 88.2 | – | – | – | – | – | – | |
MIM | Q2 | 64.8 | 80.0 | 84.8 | 74.4 | 85.4 | 89.8 | 64.2 | 74.2 | 76.4 |
(T2) | Q3 | 65.6 | 79.8 | 83.4 | 76.4 | 86.0 | 90.0 | 76.4 | 88.4 | 91.2 |
Q4 | 66.0 | 82.4 | 87.0 | 77.4 | 87.6 | 92.0 | 74.6 | 86.0 | 88.0 | |
Q5 | 68.4 | 83.0 | 88.8 | – | – | – | – | – | – | |
Q1 | 65.6 | 81.4 | 86.0 | – | – | – | – | – | – | |
MIM | Q2 | 63.2 | 80.0 | 86.6 | – | – | – | – | – | – |
(T3) | Q3 | 65.6 | 80.4 | 84.0 | 53.4 | 70.6 | 77.8 | – | – | – |
Q4 | 65.4 | 80.8 | 87.8 | – | – | – | – | – | – | |
Q5 | 65.4 | 83.0 | 88.6 | – | – | – | – | – | – | |
Q1 | 98.8 | 99.4 | 99.4 | 53.8 | 71.0 | 78.2 | 65.4 | 65.2 | 70.0 | |
MTMIM | Q2 | 98.0 | 98.0 | 98.2 | 89.0 | 94.4 | 95.6 | 64.6 | 66.6 | 68.0 |
Q3 | 97.0 | 97.4 | 97.4 | 96.6 | 97.0 | 97.2 | 94.4 | 96.4 | 97.0 | |
Q4 | 98.4 | 98.8 | 99.0 | 87.6 | 93.2 | 94.6 | 74.8 | 77.4 | 78.2 | |
Q5 | 98.6 | 98.6 | 98.6 | 57.2 | 71.8 | 78.4 | 65.6 | 66.2 | 68.0 |
Results of power (10% genome-wide significance level and LOD-1.5) to identify QTL in the MTMIM model show that QTL affecting more traits have higher chances of being identified in the forward selection. In scenario SI, which is the most favorable among all three scenarios, all QTL have effects on all traits. Therefore, all QTL were correctly identified very often, power ≥ 97% (Table2). In scenario SII, Q1 has effect on one trait only, Q2 on two traits, and Q3 on three traits. Power increases from Q1 (78.2%) to Q3 (97.2%) in the MTMIM model. Results also show that the MTMIM model can have lower power to identify QTL that has effects on only a small subset of traits when compared to the MIM model, due to greater genome-wide threshold in the MTMIM model. For instance, MTMIM model has less power (78.2%) than MIM model (84.2%) to identify Q1, which affects only T1 (same pattern is seen for Q5). However, as the subset of traits affected by a QTL increases, power of MTMIM model overpasses power of MIM model, even when some traits are not affected by that QTL. For instance, Q2 affects T1 and T2, but not T3, nevertheless, MTMIM model identifies Q2 (95.2%) more frequently than MIM model (81.8%) (same pattern carries over to Q4). The increment in power as the number of traits affected by a QTL increases was also observed in scenario SIII.
Decomposition of total power into QTL-trait power
Scenario | Subsets | |||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
(1,0,0) | (1,1,0) | (1,1,1) | ||||||||||||||
Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | Q1 | Q2 | Q3 | Q4 | Q5 | ||
SII | P_{ trait } | 66.4 | 1.2 | 0.0 | 0.8 | 64.0 | 4.2 | 86.4 | 5.0 | 87.2 | 8.2 | 0.8 | 6.6 | 89.0 | 5.8 | 0.2 |
ratio | 0.85 | 0.01 | 0.00 | 0.01 | 0.82 | 0.05 | 0.90 | 0.05 | 0.92 | 0.10 | 0.01 | 0.07 | 0.92 | 0.06 | 0.00 | |
(1,0) | (0,1) | (1,1) | ||||||||||||||
SIII | P_{ trait } | 36.8 | 2.8 | 3.4 | 1.0 | 46.0 | 2.8 | 36.2 | 4.0 | 49.6 | 1.2 | 30.4 | 29.0 | 89.6 | 27.6 | 20.8 |
ratio | 0.53 | 0.04 | 0.04 | 0.01 | 0.68 | 0.04 | 0.53 | 0.04 | 0.63 | 0.02 | 0.43 | 0.43 | 0.92 | 0.35 | 0.31 |
Mean position of QTL
Means of QTL position, LOD-d support interval coverage and length
Position | Coverage | Length | |||||||
---|---|---|---|---|---|---|---|---|---|
Analysis (Trait) | QTL | Parameter | Estimate | 1 | 1.5 | 2 | 1 | 1.5 | 2 |
MIM (T1) | Q1 | 23 [1] | 23.7 (0.31) | 91.4 | 95.7 | 99.3 | 21.7 (0.42) | 29.4 (0.55) | 37.3 (0.66) |
Q2 | 15 [2] | 14.6 (0.31) | 92.2 | 95.8 | 98.1 | 21.1 (0.38) | 27.7 (0.55) | 34.9 (0.73) | |
Q3 | 45 [3] | 45.4 (0.38) | 88.8 | 95.8 | 98.2 | 23.7 (0.49) | 33.0 (0.67) | 41.9 (0.81) | |
Q4 | 67 [5] | 66.9 (0.29) | 92.2 | 95.8 | 98.4 | 20.2 (0.35) | 26.7 (0.51) | 35.4 (0.79) | |
Q5 | 53 [6] | 52.9 (0.33) | 93.4 | 98.8 | 99.6 | 21.3 (0.43) | 28.7 (0.56) | 36.4 (0.68) | |
MIM (T2) | Q2 | 15 [2] | 14.7 (0.30) | 92.6 | 97.4 | 98.7 | 21.0 (0.88) | 27.9 (0.55) | 34.1 (0.67) |
Q3 | 45 [3] | 45.2 (0.35) | 90.6 | 95.9 | 98.3 | 22.3 (0.38) | 29.8 (0.56) | 39.1 (0.74) | |
Q4 | 67 [5] | 67.0 (0.27) | 95.3 | 98.1 | 99.6 | 19.6 (0.33) | 26.1 (0.49) | 32.6 (0.67) | |
MIM (T3) | Q3 | 45 [3] | 44.7 (0.45) | 88.8 | 94.6 | 96.8 | 25.3 (0.55) | 35.3 (0.74) | 46.2 (0.88) |
MTMIM | Q1 | 23 [1] | 23.5 (0.32) | 89.5 | 95.6 | 97.6 | 20.0 (0.38) | 26.4 (0.47) | 33.1 (0.56) |
Q2 | 15 [2] | 14.4 (0.22) | 93.1 | 97.8 | 98.9 | 16.2 (0.25) | 21.0 (0.33) | 25.3 (0.39) | |
Q3 | 45 [3] | 44.9 (0.18) | 92.8 | 97.2 | 99.4 | 13.1 (0.22) | 17.2 (0.28) | 20.7 (0.33) | |
Q4 | 67 [5] | 67.6 (0.19) | 94.2 | 97.5 | 98.9 | 15.6 (0.23) | 20.3 (0.31) | 24.2 (0.39) | |
Q5 | 53 [6] | 52.8 (0.37) | 89.5 | 97.8 | 99.8 | 19.7 (0.41) | 26.1 (0.51) | 32.6 (0.60) |
Coverage and length of LOD-d support interval
In Table4, we show the results of coverage and length of LOD-d support interval, and as can be seen, coverage for any LOD-d level are not remarkably different between the MIM and MTMIM models. However, on average the estimates of LOD-d support interval length were always larger in the MIM model. Differences in length are only marginal for QTL with effects on only a small subset of traits, but there are considerable differences for those QTL with effects on larger subset of traits. For instance, in scenario SII Q1 affects one trait only and it has LOD-1.5 support interval mean length of 29.4 cM in the MIM and 26.4 cM in the MTMIM model. On the other hand, Q2 affects two traits and it has LOD-1.5 support interval mean length of 27.7 (T1) and 27.9 (T2) in the MIM models and 21.0 cM in the MTMIM model. An interesting result is that the LOD-1.5 support interval produced confidence intervals with approximately 95% coverage in both MIM and MTMIM models.
Mean effect of QTL
Mean effect of QTL
SI | SII | SIII | ||||||
---|---|---|---|---|---|---|---|---|
Trait | QTL | Parameter | MIM | MTMIM | MIM | MTMIM | MIM | MTMIM |
T1 | Q1 | 0.52 | 0.57 (0.006) | 0.51 (0.007) | 0.56 (0.005) | 0.56 (0.005) | 0.57 (0.006) | 0.56 (0.011) |
Q2 | 0.52 | 0.56 (0.006) | 0.51 (0.006) | 0.56 (0.006) | 0.52 (0.007) | – | 0.20 (0.019) | |
Q3 | 0.52 | 0.56 (0.006) | 0.52 (0.006) | 0.54 (0.005) | 0.51 (0.007) | 0.57 (0.005) | 0.52 (0.008) | |
Q4 | 0.52 | 0.55 (0.006) | 0.51 (0.006) | 0.55 (0.006) | 0.52 (0.006) | – | 0.13 (0.015) | |
Q5 | 0.52 | 0.56 (0.006) | 0.52 (0.007) | 0.55 (0.006) | 0.56 (0.005) | 0.58 (0.005) | 0.58 (0.013) | |
T2 | Q1 | 0.52 | 0.55 (0.007) | 0.50 (0.007) | – | 0.00 (0.004) | – | 0.23 (0.016) |
Q2 | 0.52 | 0.56 (0.005) | 0.51 (0.006) | 0.57 (0.006) | 0.54 (0.007) | 0.58 (0.006) | 0.55 (0.009) | |
Q3 | 0.52 | 0.56 (0.005) | 0.52 (0.006) | 0.57 (0.005) | 0.54 (0.007) | 0.57 (0.005) | 0.54 (0.008) | |
Q4 | 0.52 | 0.55 (0.005) | 0.50 (0.006) | 0.57 (0.005) | 0.55 (0.006) | 0.58 (0.006) | 0.60 (0.008) | |
Q5 | 0.52 | 0.55 (0.006) | 0.52 (0.007) | – | 0.00 (0.005) | – | 0.09 (0.015) | |
T3 | Q1 | 0.52 | 0.56 (0.005) | 0.52 (0.006) | – | 0.00 (0.005) | – | – |
Q2 | 0.52 | 0.55 (0.005) | 0.51 (0.007) | – | 0.01 (0.004) | – | – | |
Q3 | 0.52 | 0.55 (0.005) | 0.51 (0.006) | 0.51 (0.006) | 0.44 (0.008) | – | – | |
Q4 | 0.52 | 0.55 (0.005) | 0.52 (0.007) | – | 0.00 (0.003) | – | – | |
Q5 | 0.52 | 0.56 (0.006) | 0.53 (0.008) | – | 0.00 (0.004) | – | – |
The effects of all QTL were overestimated in the MIM model. This phenomena is expected due to estimation conditional on detection, the so-called “Beavis effect”[22]. A qualitative comparison of results show that overall the estimation of QTL effects in the MTMIM model are less biased than in the MIM model.
Pleiotropic versus closely linked nonpleiotropic QTL
In scenario SIII, after selecting an MTMIM model in the forward selection, each mapped pleiotropic QTL was tested against the alternative of closely linked nonpleiotropic QTL. In the bivariate model, we performed a two-dimensional search for positions of putative closely linked nonpleiotropic QTL in the neighborhood of the position of each pleiotropic QTL, as suggested in[10]. The model with nonpleiotropic QTL that showed highest likelihood within the two-dimensional search region was selected and tested against the model with pleiotropic QTL. We compared two criteria for model selection, the AICc and LRT. The critical value for the LRT at 5% significance level was obtained from a chi-squared probability distribution with one degree of freedom.
Because Q3 was simulated as being pleiotropic, rejection of pleiotropic hypothesis for Q3 provides a measure of type I error. On the other hand, Q1 and Q2, and Q4 and Q5 were simulated as pairs of closely linked nonpleiotropic QTL. Therefore, rejection of pleiotropic hypothesis at these QTL provides a measure of power. Under our simulation setting, the LRT performed better than the AICc. The LRT was able to keep the best balance between type I error and power. Estimated frequency of rejecting pleiotropy for Q3 (4%) using the LRT agrees very well with the expected 5% nominal error rate, and estimated frequency of rejecting pleiotropy for Q1 (38%) and Q2 (36%) are satisfactory high, taking into account that Q1 and Q2 are considerably close to each other in a linkage map with markers considerably distant from each other (10 cM from marker-to-marker). On the other hand, the AICc criterion showed higher power for Q1 (45%) and Q2 (45%), but with a cost of high type I error for Q3 (15%). Moreover, because Q4 and Q5 are 15 cM apart from each other, the frequency of rejecting pleiotropy using LRT for these two QTL (41 and 48%, respectively) is higher than for Q1 (38%) and Q2 (36%), which are 10 cM apart from each other.
Motivating example revisited
Motivated by the fact that the joint analysis of PC1 and ADJPC1 in the Drosophila dataset could provide additional information to distinguish between genetic effects of QTL on size and shape of posterior lobe, we then analyzed these two traits with the MTMIM model. Such additional information are: (1) testing pleiotropic versus closely linked nonpleiotropic QTL, and (2) estimating the contribution of each QTL in the fitted model to the genotypic variance-covariance matrix between PC1 and ADJPC1. In what follows, we show results of the MIM and MTMIM model of the pooled samples from BM1 and BM2 (n=192+299), the BM data. We also take advantage of this dataset to test the GEM-NR algorithm for maximizing the likelihood function under the MTMIM model with many QTL. Using data from a genetic experiment would provide more realistic comparisons between the GEM-NR and ECM algorithms than a simulated dataset would do.
Estimates of QTL position and main effect on PC1 and ADJPC1 of BM data
MIM | MTMIM (GEM-NR) | MTMIM (ECM) | |||||||
---|---|---|---|---|---|---|---|---|---|
PC1 | ADJPC1 | (PC1 and ADJPC1) | (PC1 and ADJPC1) | ||||||
QTL | $\widehat{\mathit{p}}$ ^{ a } | ${\widehat{\mathit{\beta}}}_{\mathbf{1}}$ | $\widehat{\mathit{p}}$ | ${\widehat{\mathit{\beta}}}_{\mathbf{2}}$ | $\widehat{\mathit{p}}$ | ${\widehat{\mathit{\beta}}}_{\mathbf{1}}$ | ${\widehat{\mathit{\beta}}}_{\mathbf{2}}$ | ${\widehat{\mathit{\beta}}}_{\mathbf{1}}$ | ${\widehat{\mathit{\beta}}}_{\mathbf{2}}$ |
Chromosome X | |||||||||
1 | 1 | 0.0020 | 1 | 0.0165 | 1 | 0.0021 | 0.0175 | 0.0021 | 0.0175 |
2 | 20 | 0.0018 | 20 | 0.0284 | 20 | 0.0017 | 0.0275 | 0.0017 | 0.0275 |
Chromosome 2 | |||||||||
3 | – | – | 1 | 0.0304 | 1 | 0.0007 | 0.0293 | 0.0007 | 0.0293 |
4 | 14 | 0.0018 | 17 | 0.0215 | 17 | 0.0018 | 0.0220 | 0.0018 | 0.0220 |
5 | 26 | 0.0017 | 30 | 0.0141 | 29 | 0.0012 | 0.0146 | 0.0011 | 0.0146 |
6 | 71 | 0.0016 | – | – | 70 | 0.0017 | -0.0048^{ ns } | 0.0017 | -0.0048^{ ns } |
7 | 111 | 0.0009 | 116 | 0.0147 | 116 | 0.0011 | 0.0176 | 0.0011 | 0.0177 |
8 | 144 | 0.0012 | 144 | 0.0091 | 144 | 0.0011 | 0.0082 | 0.0011 | 0.0082 |
Chromosome 3 | |||||||||
9 | 5 | 0.0013 | – | – | 4 | 0.0011 | 0.0107 | 0.0011 | 0.0107 |
10 | 17 | 0.0022 | 16 | 0.0503 | 17 | 0.0022 | 0.0427 | 0.0022 | 0.0426 |
11 | 48 | 0.0033 | 44 | 0.0279 | 45 | 0.0027 | 0.0253 | 0.0027 | 0.0254 |
12 | – | – | 54 | 0.0235 | 54 | 0.0007^{ ns } | 0.0255 | 0.0007 | 0.0254 |
13 | 82 | 0.0033 | 83 | 0.0391 | 83 | 0.0034 | 0.0394 | 0.0034 | 0.0394 |
14 | 112 | 0.0009 | 116 | 0.0324 | 115 | 0.0009 | 0.0257 | 0.0009 | 0.0257 |
15 | 129 | 0.0015 | – | – | 128 | 0.0012 | 0.0094^{ ns } | 0.0012 | 0.0094^{ ns } |
16 | 147 | 0.0007 | 146 | 0.0116 | 145 | 0.0009 | 0.0092 | 0.0009 | 0.0092 |
17 | 169 | 0.0021 | 166 | 0.0268 | 167 | 0.0021 | 0.0273 | 0.0021 | 0.0273 |
Total QTL | 15 | 14 | 17 | ||||||
${\widehat{\mathit{\sum}}}_{p}$ | 2.761 | 31.73 | |||||||
31.73 | 521.6 | ||||||||
${\widehat{\mathit{\sum}}}_{g}$ | 2.358 | – | 2.369 | 31.48 | |||||
– | 453.0 | 31.48 | 453.2 |
MIM models of PC1 and ADJPC1 all together showed statistical evidence of twelve genomic regions with statistical significant QTL affecting both traits, and five regions with statistically significant QTL affecting either one of the traits (regions 3, 6, 9 , 12 and 15 shown in Figure1 and Table6). MTMIM model mapped these five regions either exactly or very close to their respective estimated positions in the MIM models. Moreover, the estimated effects of these five regions in the MTMIM model showed small discrepancy from those estimates in the MIM models (Table6). Nevertheless, empirical results from our simulations suggest that both estimates of positions and effects of QTL in the MTMIM model are more accurate than in the MIM models.
Positions of QTL in regions 4, 5, 7, 10, 11, 13, 14, 16 and 17 (Figure1 and Table6) did not coincide with those in the MIM models of PC1 and ADJPC1. Therefore, one could hypothesize the existence of two closely linked nonpleiotropic QTL at each of these regions. We tested the hypothesis of pleiotropic QTL versus closely linked nonpleiotropic QTL at each of these regions, and on the basis of the data available the null hypothesis of pleiotropic QTL could not be rejected for any region. Thus, since PC1 contains attributes of both shape and size of posterior lobe, whereas ADJPC1 contains attributes of size only, the available data provides strong evidence that the genetic mechanisms controlling shape and size of posterior lobe are highly similar.
Estimated QTL-specific (multiplied by 10^{5}) genotypic variance-covariance matrix between traits PC1 and ADJPC1
QTL | |||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Traits | QTL | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |||||||||||||||||
PC1 | 1 | 0.11 | 0.93 | 0.12 | 1.49 | 0.00 | 0.08 | 0.01 | 0.07 | 0.00 | 0.03 | 0.00 | -0.01 | 0.00 | 0.05 | 0.01 | 0.07 | 0.00 | -0.02 | -0.01 | -0.11 | -0.03 | -0.28 | -0.01 | -0.13 | 0.01 | 0.06 | -0.01 | -0.10 | -0.01 | -0.05 | 0.00 | 0.02 | 0.00 | 0.03 |
ADJ | 0.93 | 7.69 | 1.49 | 16.20 | 0.08 | 1.06 | 0.07 | 0.64 | 0.03 | 0.30 | -0.01 | 0.11 | 0.05 | 0.57 | 0.07 | 0.54 | -0.02 | -0.16 | -0.11 | -1.32 | -0.28 | -2.44 | -0.13 | -1.74 | 0.06 | 0.61 | -0.10 | -1.23 | -0.05 | -0.39 | 0.02 | 0.21 | 0.03 | 0.28 | |
PC1 | 2 | 0.08 | 1.21 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 0.00 | -0.03 | 0.01 | 0.10 | 0.01 | 0.09 | 0.00 | 0.02 | 0.00 | 0.05 | -0.01 | -0.16 | 0.00 | -0.05 | 0.01 | 0.20 | -0.01 | -0.13 | -0.01 | -0.10 | 0.00 | 0.00 | -0.01 | -0.10 | ||
ADJ | 1.21 | 19.13 | 0.00 | -0.05 | 0.00 | 0.05 | 0.02 | 0.26 | -0.03 | 0.17 | 0.10 | 1.57 | 0.09 | 0.92 | 0.02 | 0.29 | 0.05 | 0.81 | -0.16 | -1.83 | -0.05 | -1.09 | 0.20 | 2.66 | -0.13 | -2.57 | -0.10 | -1.00 | 0.00 | 0.04 | -0.10 | -1.36 | |||
PC1 | 3 | 0.01 | 0.52 | 0.05 | 1.30 | 0.02 | 0.47 | 0.00 | 0.09 | 0.00 | -0.03 | 0.00 | -0.04 | 0.00 | 0.07 | 0.00 | -0.02 | -0.01 | -0.22 | 0.00 | -0.08 | -0.01 | -0.28 | 0.00 | -0.01 | 0.00 | 0.04 | 0.00 | 0.04 | 0.00 | -0.04 | ||||
ADJ | 0.52 | 21.64 | 1.30 | 24.16 | 0.47 | 9.37 | 0.09 | -0.57 | -0.03 | -0.63 | -0.04 | -0.51 | 0.07 | 1.06 | -0.02 | -0.55 | -0.22 | -3.36 | -0.08 | -3.13 | -0.28 | -5.17 | -0.01 | -0.20 | 0.04 | 0.49 | 0.04 | 0.70 | -0.04 | -0.72 | |||||
PC1 | 4 | 0.10 | 1.18 | 0.07 | 0.92 | 0.03 | 0.15 | 0.00 | 0.00 | -0.01 | -0.05 | 0.01 | 0.11 | 0.00 | -0.02 | -0.03 | -0.32 | -0.01 | -0.16 | -0.02 | -0.24 | 0.01 | 0.13 | 0.01 | 0.07 | 0.01 | 0.06 | 0.00 | -0.03 | ||||||
ADJ | 1.18 | 14.08 | 0.92 | 11.44 | 0.15 | -1.12 | 0.00 | -0.03 | -0.05 | -0.45 | 0.11 | 1.14 | -0.02 | -0.36 | -0.32 | -3.32 | -0.16 | -2.94 | -0.24 | -2.88 | 0.13 | 2.22 | 0.07 | 0.64 | 0.06 | 0.68 | -0.03 | -0.41 | |||||||
PC1 | 5 | 0.02 | 0.31 | 0.03 | 0.13 | 0.00 | 0.04 | 0.00 | -0.01 | 0.00 | 0.05 | 0.00 | -0.02 | -0.01 | -0.13 | 0.00 | -0.07 | -0.01 | -0.10 | 0.00 | 0.05 | 0.00 | 0.03 | 0.00 | 0.03 | 0.00 | 0.03 | ||||||||
ADJ | 0.31 | 4.06 | 0.13 | -0.93 | 0.04 | 0.57 | -0.01 | -0.05 | 0.05 | 0.50 | -0.02 | -0.39 | -0.13 | -1.40 | -0.07 | -1.44 | -0.10 | -1.19 | 0.05 | 0.83 | 0.03 | 0.32 | 0.03 | 0.37 | 0.03 | 0.38 | |||||||||
PC1 | 6 | 0.07 | -0.20 | 0.02 | 0.16 | 0.01 | 0.02 | 0.00 | -0.01 | -0.01 | -0.06 | -0.01 | -0.02 | 0.00 | -0.01 | 0.00 | 0.00 | 0.00 | 0.05 | 0.00 | 0.01 | 0.00 | 0.01 | 0.00 | 0.01 | ||||||||||
ADJ | -0.20 | 0.57 | 0.16 | -1.08 | 0.02 | -0.17 | -0.01 | 0.08 | -0.06 | 0.39 | -0.02 | 0.16 | -0.01 | 0.08 | 0.00 | 0.03 | 0.05 | -0.33 | 0.01 | -0.05 | 0.01 | -0.07 | 0.01 | -0.09 | |||||||||||
PC1 | 7 | 0.03 | 0.49 | 0.03 | 0.32 | 0.00 | 0.01 | 0.00 | 0.08 | 0.00 | 0.03 | 0.00 | 0.02 | 0.00 | 0.02 | 0.00 | -0.10 | 0.00 | -0.05 | 0.00 | -0.03 | -0.01 | -0.11 | ||||||||||||
ADJ | 0.49 | 7.76 | 0.32 | 3.09 | 0.01 | 0.09 | 0.08 | 1.38 | 0.03 | 0.30 | 0.02 | 0.35 | 0.02 | 0.21 | -0.10 | -2.07 | -0.05 | -0.56 | -0.03 | -0.41 | -0.11 | -1.61 | |||||||||||||
PC1 | 8 | 0.03 | 0.22 | 0.00 | 0.00 | 0.00 | 0.04 | 0.00 | -0.02 | 0.00 | 0.00 | 0.00 | 0.03 | 0.00 | -0.04 | 0.00 | -0.01 | 0.00 | -0.01 | 0.00 | -0.02 | ||||||||||||||
ADJ | 0.22 | 1.54 | 0.00 | 0.01 | 0.04 | 0.37 | -0.02 | -0.13 | 0.00 | 0.01 | 0.03 | 0.26 | -0.04 | -0.44 | -0.01 | -0.11 | -0.01 | -0.04 | -0.02 | -0.15 | |||||||||||||||
9 | 0.03 | 0.30 | 0.08 | 1.24 | 0.04 | 0.34 | 0.01 | 0.15 | -0.01 | -0.08 | 0.00 | 0.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | |||||||||||||||||
0.30 | 2.88 | 1.24 | 16.13 | 0.34 | 3.25 | 0.15 | 2.34 | -0.08 | -0.85 | 0.01 | 0.14 | 0.00 | 0.00 | 0.00 | -0.01 | 0.00 | -0.05 | ||||||||||||||||||
10 | 0.12 | 2.32 | 0.13 | 1.93 | 0.02 | 0.73 | 0.02 | 0.29 | 0.00 | 0.06 | 0.00 | -0.01 | 0.00 | -0.04 | -0.01 | -0.20 | |||||||||||||||||||
2.32 | 45.50 | 1.93 | 24.29 | 0.73 | 18.94 | 0.29 | 4.25 | 0.06 | 1.36 | -0.01 | -0.07 | -0.04 | -0.61 | -0.20 | -3.05 | ||||||||||||||||||||
11 | 0.18 | 1.64 | 0.07 | 1.66 | 0.15 | 1.58 | 0.01 | 0.27 | 0.01 | 0.08 | 0.00 | -0.01 | -0.01 | -0.14 | |||||||||||||||||||||
1.64 | 15.19 | 1.66 | 24.92 | 1.58 | 16.29 | 0.27 | 3.81 | 0.08 | 0.68 | -0.01 | -0.14 | -0.14 | -1.48 | ||||||||||||||||||||||
12 | 0.01 | 0.38 | 0.05 | 1.16 | 0.00 | 0.14 | 0.00 | 0.07 | 0.00 | 0.00 | 0.00 | -0.07 | |||||||||||||||||||||||
0.38 | 14.74 | 1.16 | 20.87 | 0.14 | 4.55 | 0.07 | 0.89 | 0.00 | -0.04 | -0.07 | -1.41 | ||||||||||||||||||||||||
13 | 0.27 | 3.12 | 0.05 | 0.91 | 0.04 | 0.39 | 0.01 | 0.17 | 0.01 | 0.09 | |||||||||||||||||||||||||
3.12 | 36.41 | 0.91 | 15.12 | 0.39 | 3.66 | 0.17 | 1.89 | 0.09 | 1.06 | ||||||||||||||||||||||||||
14 | 0.02 | 0.53 | 0.04 | 0.70 | 0.02 | 0.31 | 0.01 | 0.31 | |||||||||||||||||||||||||||
0.53 | 15.11 | 0.70 | 8.66 | 0.31 | 5.01 | 0.31 | 5.53 | ||||||||||||||||||||||||||||
15 | 0.04 | 0.29 | 0.03 | 0.30 | 0.04 | 0.38 | |||||||||||||||||||||||||||||
0.29 | 2.27 | 0.30 | 2.82 | 0.38 | 3.73 | ||||||||||||||||||||||||||||||
16 | 0.02 | 0.18 | 0.05 | 0.57 | |||||||||||||||||||||||||||||||
0.18 | 2.03 | 0.57 | 6.83 | ||||||||||||||||||||||||||||||||
17 | 0.11 | 1.44 | |||||||||||||||||||||||||||||||||
1.44 | 18.55 | ||||||||||||||||||||||||||||||||||
Total | 2.36 | 31.48 | |||||||||||||||||||||||||||||||||
31.48 | 453.20 |
Conclusions
A novel statistical method for multiple trait multiple interval mapping (MTMIM) of QTL from inbred line crosses was proposed and developed. We also proposed a novel method for estimating genome-wide threshold and assessing the significance level of putative QTL effects in the MTMIM model. The method of genome-wide threshold estimation is based on the score-based resampling framework[17]. The MTMIM model has the advantage of allowing us to map QTL with effects on multiple traits, while taking advantage of information from correlations between traits. The MTMIM model has been implemented in the freely available software Windows QTL Cartographer[23].
The MTMIM model provides a comprehensive framework for QTL inference on multiple traits and the score-based threshold serves as an essential and elegant tool for computing significance level of effects of putative QTL in the genome-wide scan. The MTMIM model and score-based threshold were evaluated through simulations. Also, we analyzed data from an experiment with Drosophila for the purpose of illustrating the MTMIM model and evaluating the performances of the GEM-NR and ECM algorithms.
Results from our simulations showed many interesting features of the MTMIM model and score-based threshold. First, the score-based threshold maintained the type I error at a desired nominal level when no QTL effects were present in the simulated datasets. Second, discovery of spurious QTL (false discovery rate) was almost constant across genome-wide significance levels of 1, 5 and 10%, while power to identify simulated QTL increased substantially as the significance level grew less stringent. Therefore, a more liberal (10%) genome-wide significance level could be used in the genome-wide scan, corroborating the results of C. Laurie, S. Wang, L. A. Carlini-Garcia and Z-B. Zeng as observed in the MIM model (unpublished observations). Third, the MTMIM model could show lower power than the MIM model for QTL with effects on only a small subset of traits. However, as the number of traits affected by a QTL increases, power in the MTMIM model overpasses power in the MIM model even when not all traits under analysis are affected by that QTL. Forth, on average the estimates of QTL position in the MIM and MTMIM models were very similar, but the MTMIM model delivers estimates with smaller sampling variances. Fifth, the LOD-1.5 support interval produced confidence intervals for QTL position with approximately 95% coverage in both the MIM and MTMIM models. However, the support interval was much wider in the MIM than in MTMIM model. Overall, a qualitative comparison of results from the MIM and MTMIM models shows that effect estimates in the latter are less biased than in the former. Lastly, the LRT was shown to keep adequate type I error level when testing the null hypothesis of pleiotropic QTL against the alternative of closely linked nonpleiotropic QTL in the bivariate analysis, while it delivered reasonable power when data were generated under the alternative.
Throughout this paper, we provided compelling empirical evidences that the score-based threshold maintained proper type I error rate and tend to give a false discovery rate within acceptable level, and that the MTMIM model can deliver better parameter estimates and power than the MIM model, and yet the MTMIM model provides a framework to test hypotheses of pleiotropic QTL versus closely linked nonpleiotropic QTL, QTL by environment interaction, and to estimate the total genotypic variance-covariances matrix between traits and to decompose it in terms of QTL-specific variance-covariance matrices. An analysis of phenotypic and genotypic data from an experiment with Drosophila illustrated the new tools present in the MTMIM model. In conclusion, the MTMIM model is a valuable tool to better extract information from experiments with measurements in multiple quantitative traits, therefore, providing more details on the genetic architecture of complex traits.
Methods
In what follows, for any matrix A, its transpose is denoted by${\mathit{A}}^{\prime}$, its inverse by A^{−1}, its u^{tℎ} row by A_{[u,·]}, its v^{tℎ} column by A_{[·,v]}, and its element in row u and column v by A_{[u,v]} .
Statistical model
For each subject i, let${\mathit{y}}_{i}={({y}_{1i},{y}_{2i},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{y}_{\mathrm{Ti}})}^{\prime}$ be a T × 1 vector of trait measurements, and${\mathit{e}}_{i}={({e}_{1i},{e}_{2i},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{e}_{\mathrm{Ti}})}^{\prime}$ be a T × 1 random vector assumed to be independent and identically distributed according to a multivariate normal distribution with mean vector zero and positive definite symmetric variance-covariance matrix ∑_{ e }, i.e., e_{ i } ∼ MV N_{ T } (0,∑_{ e }). For each r, let${\mathit{\beta}}_{r}={({\beta}_{1r},{\beta}_{2r},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{\beta}_{\mathrm{Tr}})}^{\prime}$ be a column vector of main effects. For each pair r and l (r < l, r = 1,2,· · · ,p) of interaction, let${\mathit{w}}_{b}={({w}_{1\mathit{rl}},{w}_{2\mathit{rl}},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{w}_{\mathit{Trl}})}^{\prime}$ be a column vector of epistatic effects (b = 1,2,· · · ,p). Lastly, let$\mathit{u}={({u}_{1},{u}_{2},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{u}_{T})}^{\prime}$ be a T × 1 vector of means.
We collect all effect parameters (m main and p epistatic effect vectors) into a T × s (s = m + p) matrix$\mathcal{B}\phantom{\rule{.1em}{0ex}}=\phantom{\rule{.1em}{0ex}}(\phantom{\rule{.1em}{0ex}}{\mathit{\beta}}_{1},\phantom{\rule{1em}{0ex}}{\mathit{\beta}}_{2},\phantom{\rule{1em}{0ex}}\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},\phantom{\rule{1em}{0ex}}{\mathit{\beta}}_{m},\phantom{\rule{1em}{0ex}}{\mathit{w}}_{1},\phantom{\rule{1em}{0ex}}{\mathit{w}}_{2},\phantom{\rule{1em}{0ex}}\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},\phantom{\rule{1em}{0ex}}{\mathit{w}}_{p})$, and all model parameters into a column vector θ = (θ_{1}, θ_{2},· · ·,${{\mathit{\theta}}_{s},{\mathit{\mu}}^{\prime},\mathit{\text{vect}}\left({\mathit{\sum}}_{e}\right))}^{\prime}$, where${\mathit{\theta}}_{b}={\mathit{\beta}}_{b}^{\prime}$ for 1 ≤ b ≤ m and${\mathit{\theta}}_{b}={\mathit{w}}_{b}^{\prime}$ for m <b ≤ s, and vect(∑_{ e }) is an operator that stacks the rows of ∑_{ e } into a column vector one on the top of the other and then transposes it. Motivated by the fact that a QTL may not have significant effect on all traits under analysis, we allow for the insignificant parameter effects in each vector θ_{ b } to be constrained to zero. Therefore, the MTMIM model allows each trait to have its own set of effect parameters, as in the seemingly unrelated regression model[26].
Likelihood function
In order to search the entire genome for significant QTL effects, the genome is partitioned into H points, usually at 1-centiMorgan (cM) grid. This partition is denoted by ζ. The set of positions of m putative QTL, λ = {λ_{1}λ_{2},· · · λ_{ m } }, is assumed to be a subset of ζ[27]. For any subject i, let M_{ i } be the genotypic information of markers flanking the m QTL, and${M}_{i,L}^{r}$ and${M}_{i,R}^{r}$ be the flanking markers on left and right of QTL r, respectively. In a diploid species, a subject from a BC population generated from inbred line crosses has either genotype QQ or Qq for a locus, assuming the recurrent parent has genotype QQ. Therefore, if there are m QTL affecting a trait, there are 2^{ m } possible genotypes for any subject i. Genotypes of the form G_{ j } = _{Q1}Q_{2} · · ·Q_{ m }, where Q_{ r } ∈ {QQ Qq}, r = 1,2,· · · m and j = 1,2,· · · 2^{ m }. Then, assuming no crossover interference between marker intervals and no more than one QTL existing within a marker interval, the probability of any genotype G_{ j }, conditional on the genotypes of markers flanking the m QTL is${p}_{\mathit{ij}}=P\left({G}_{j}\right|{\mathit{M}}_{i},\mathcal{R},\mathit{\lambda})=\prod _{r=1}^{m}P\left({Q}_{r}\left.\right|{M}_{i,L}^{r},{M}_{i,R}^{r},{\lambda}_{r}\right)$, where the probabilities on the right hand side of this equation can be estimated via a Hidden Markov model[28].
We define an s × 2^{ m } matrix Z of coded genotypes according to Cockerham genetic model[24, 25]. In the matrix Z each row b, Z_{[b,·]}, corresponds to a column of effect parameters in$\mathcal{B}(b=1,2,\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},s)$ and each column j, Z_{[·,j]}, represents a coded genotype G_{ j } . If b ≤ m, Z_{[b,j]} = x_{ r }, otherwise Z_{[b,j]} = x_{ r } ∗x_{ l }, where x_{ u } (u ∈ {r, l}) is either$\frac{1}{2}$ or$-\frac{1}{2}$, depending on whether the genotype of QTL Q_{ u } in G_{ j } is QQ or Qq, respectively.
The individual (L_{ i }) and overall likelihood (L) functions of data under the MTMIM model with m QTL are mixtures of 2^{ m } multivariate normal distribution functions with different means ($\mathit{u}+\mathcal{B}{\mathit{Z}}_{[\xb7,j]}$), assumed same variance-covariance (∑_{ e }), and mixing probabilities p_{ i j } (j = 1,2,· · · ,2^{ m }), i.e.,${L}_{i}\left(\mathit{\theta}\right|{\mathit{y}}_{i},{\mathit{M}}_{i},\mathit{\lambda})=\sum _{j=1}^{{2}^{m}}{p}_{\mathit{ij}}\varphi ({\mathit{y}}_{i}|\mathit{u}+\mathcal{B}{\mathit{Z}}_{[\xb7,j]},{\mathit{\sum}}_{e})$ and$L\left(\mathit{\theta}\right|\mathit{Y},\mathit{M},\mathit{\lambda})=\prod _{i=1}^{n}{L}_{i}(\mathit{\theta}|{\mathit{y}}_{i},{\mathit{M}}_{i},\mathit{\lambda})$, where y is a T × n matrix of trait measurements, and$\varphi \left({\mathit{y}}_{i}\right|\mathit{u}+\mathcal{B}{\mathit{Z}}_{[\xb7,j]},{\mathit{\sum}}_{e})$ is the probability distribution function of a multivariate normal random variable y_{ i } with mean$\mathit{u}+\mathcal{B}{\mathit{Z}}_{[\xb7,j]}$ and variance-covariance ∑_{ e } . In what follows, ℓ_{ i } (θ|y_{ i },_{ M i },λ) and ℓ(θ|y,M,λ) are the natural logarithm of the individual and overall likelihood functions, respectively.
Parameter estimation
Estimation of parameters in the likelihood function is cumbersome due to mixture of distributions. The expectation-maximization (EM)[29] algorithm is very popular for parameter estimation in mixture models. The EM algorithm is very simple to program, given that efficient estimators are available for the “complete-data”. Moreover, the EM algorithm guarantees that the likelihood function is nondecreasing in every iteration. However, EM may show slow convergence rate if there are many missing data, and EM does not provide standard errors of parameter estimates.
Many modifications of the EM algorithm and many hybrids of EM and Gauss-Newton (GN) methods have been proposed[30–32]. GN methods are not guaranteed to converge when the logarithm likelihood function is not concave, but if there is convergence its rate is usually quadratic, as opposite to the linear rate of EM. Therefore, speed of convergence of GN may be much faster than EM. We describe two algorithms to obtain the maximum likelihood estimators (MLE) of parameters in the MTMIM model: expectation-conditional maximization (ECM) and a hybrid of EM and Newton-Raphson called generalized EM-NR (GEM-NR).
Expectation-conditional maximization algorithm
The EM algorithm[29] solves the incomplete logarithm likelihood function iteratively in terms of the unobserved complete-data logarithm likelihood function. If the complete-data logarithm likelihood function is messy and the M-step is complex, then the EM algorithm is no longer attractive. For such cases of complicate M-step,[33] proposed a class of generalized EM algorithm, called expectation-conditional maximization (ECM). The ECM enjoys the convergence properties of the EM while simplifying the estimation of parameters. In the ECM, a complex M-step is broken down into many simpler CM-steps, each one of them maximizes the expected complete-data logarithm likelihood function conditional on some function of the parameters. Besides simplifying the M-step, the CM-step is often faster and more stable than the M-step because the conditional maximization are over spaces of smaller dimensions[33].
It is worth mentioning that in the E-step above, the updating equation at step v + 1 does not use the probabilities from the previous step v, i.e, it uses p_{ i j } instead of${\Pi}_{\mathit{ij}}^{\left(\nu \right)}$. This is the case in QTL mapping literature because the a priori probabilities are indeed exellent estimates of the conditional probabilities of QTL given the flanking markers.
for b ∈ {1,2,· · · ,s}.
The E- and CM-steps are computed iteratively until convergence of the likelihood function. Our choice of initial values for u and ∑_{ e } are the sample mean and the sample variance-covariance, respectively, and all parameters in$\mathcal{B}\phantom{\rule{0.5em}{0ex}}$ are set to zero. In the genome-wide scan, an alternative efficient choice of initial values is to use converged parameters of a previous position in the search grid. For any small positive real number ε, a stoping rule for the convergence of the likelihood function can be defined as [L(θ^{(v + 1)}|Y, M, λ)−L(θ^{(v)}|Y, M, λ)] / L(θ^{(v)}|Y, M, λ) < ε.
It is worth mentioning that for many combinations of i and j, the probabilities p_{ i j } are zero or very close to zero. Therefore, one may choose to ignore unimportant small probabilities in the computations, which may lead to significant improvement on computation time.
Generalized EM algorithm based on Newton-Raphson methods
where C is the Cholesky decomposition of the negative of the matrix of second order derivatives of the complete logarithm likelihood function (see Appendix) and I is an identity matrix.
To guarantee that the logarithm likelihood function is nondecreasing,[31] proposed to start the EM algorithm with five iterations to quickly approach the MLE and then to switch to NR until either convergence or decrease of the logarithm likelihood function. If the logarithm likelihood function decreases, they suggested halving the step-size κ up to five times. If the logarithm likelihood function still decreases, they suggested to return to the EM, run five iterations, and then to switch back to NR.[31] argued that their choice of running the EM algorithm for five iterations is based on previous experiences of[34] that 95% of the change in the initial value of logarithm likelihood function until its maximum value often happens in five EM iterations.
- 1.
Run the ECM algorithm a couple of iterations (say five iterations);
- 2.
Let θ^{(v)} be the parameter estimate in the v^{tℎ} EM iteration;
- 3.
Set κ^{(v)} = 1;
- 4.
Estimate θ^{(v + 1)} using equation (2) with the first and second order derivatives of Q_{ c } (θ|y) evaluated at θ^{(v)} ;
5. ● If ℓ(θ^{(v + 1)}|Y, M, λ) > ℓ(θ^{(v)}|Y, M, λ), then set θ^{(v + 1)} as the updated parameter;
● Otherwise, keep repeating step 4 with smaller and smaller κ^{(v)}, until the likelihood function increases or until κ^{(v)} gets too small, in which case start again in step 1;
In cases in which the complete-data logarithm likelihood function does not allow for closed form solution of parameter estimators,[30] have found that the GEM-NR can reduce significantly the computation burden when compared to the EM algorithm. In the Appendix, we derived all expressions (first and second order derivatives of the complete-data logarithm likelihood function) to implement the GEM-NR algorithm.
Genome-wide significance level and model selection
Score-based threshold
We extend the score statistic[17] to assess the genome-wide statistical significance level of QTL effect in the MTMIM model. Based on the individual and overall likelihood functions, we derived all required expressions to compute the score statistic to test any effect parameter in the MTMIM model (see Appendix).
Under some regular conditions, the score and LRT statistics are asymptotically equivalent in large sample[35]. But, an interesting characteristic of the score statistic is that it can be approximated by a sum of independent random components. Motivated by this characteristic and based on the decomposition of the score function[17, 36] derived the large-sample distribution of the score statistic for genome-wide QTL mapping.
In multiple trait genome-wide scan, a putative pleiotropic QTL is assumed at every position λ ∈ ζ and the significance level of its effects (main or epistatic effects) is tested against the null of no effects. For instance, assume a model with m − 1 QTL with main effects and p epistatic effects between certain QTL pairs. Assume we are scanning for a putative m^{tℎ} QTL. Let l = λ denotes the testing position of the putative QTL coming into the model. Let λ = (λ_{1},λ_{2},· · · ,λ_{(m−1)}, l) be the current positions of all m QTL in the model. Let${\mathit{\theta}}_{m}={\mathit{\beta}}_{m}^{\prime}$ be a T × 1 vector of effects for the new QTL coming into the model, and let$\mathit{\theta}={({\mathit{\theta}}_{1},{\mathit{\theta}}_{2},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{\mathit{\theta}}_{m-1},{\mathit{\theta}}_{m},{\mathit{\theta}}_{m+1},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{\mathit{\theta}}_{s},{\mathit{u}}^{\prime},\mathit{\text{vect}}({\mathit{\sum}}_{e}\left)\right)}^{\prime}$ be a column vector of all parameters in the model, where${\mathit{\theta}}_{b}={\mathit{\beta}}_{b}^{\prime}$ for 1 ≤ b ≤ m and${\mathit{\theta}}_{b}={\mathit{w}}_{b}^{\prime}$ for m <b ≤ s. Let$\mathit{\eta}={({\mathit{\theta}}_{1},{\mathit{\theta}}_{2},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{\mathit{\theta}}_{m-1},{\mathit{\theta}}_{m+1},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{\mathit{\theta}}_{s},{\mathit{u}}^{\prime},\mathit{\text{vect}}({\mathit{\sum}}_{e}\left)\right)}^{\prime}$ be the column vector of nuisance parameters. Then the hypothesis H_{0} : θ_{ m } = 0 versus H_{1} : θ_{ m } ≠ 0 is assessed at every position l in the genome by the LRT. The genomic position with the maximum LRT among all l is assessed for the presence of a QTL via the score-based method.
where$\stackrel{\mathbf{~}}{\mathit{\eta}}$ is the MLE of η under H_{0} (see Appendix for a detailed derivation of first and second order derivatives of the likelihood function).
- 1.
generate n independent normal variables z_{ i } (i = 1,2,· · · ,n) from N(0,1);
- 2.
for each l, compute ${\mathit{\xdb}}^{\ast}\left(l\right)=\sum _{i=1}^{n}{\mathit{\xdb}}_{i}\left(l\right){z}_{i}$, ${S}^{\ast}\left(l\right)={\mathit{\xdb}}^{\ast \prime}\left(l\right){\widehat{V}}^{-1}\left(l\right){\mathit{\xdb}}^{\ast}\left(l\right)$. Then, compute ${S}^{\ast}=\underset{l\in \mathit{\zeta}}{\mathit{\text{max}}}\left\{{S}^{\ast}\right(l\left)\right\}$;
- 3.
repeat steps 1 and 2 many times, say N times (resampling), to obtain a sequence $({S}_{1}^{\ast},{S}_{2}^{\ast},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{S}_{N}^{\ast})$;
- 4.
the score-based threshold for a given significance α-level is the 100(1 − α) percentile of the ascending ordered values $({S}_{\left(1\right)}^{\ast},{S}_{\left(2\right)}^{\ast},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{S}_{\left(N\right)}^{\ast})$.
If${\mathit{\xdb}}_{i}\left(l\right)$ in${\mathit{\xdb}}^{\ast}\left(l\right)$ and$\widehat{V}\left(l\right)$ are assumed to be fixed and z_{ i } in${\mathit{\xdb}}^{\ast}\left(l\right)$ to be random, then: (I) The conditional distribution of${\mathit{\xdb}}^{\ast}\left(l\right)$ on the observed data is normal with mean zero and limiting covariance as that of$\mathit{\xdb}\left(l\right)$; (II) From I, it follows that the distributions of${n}^{-\frac{1}{2}}{\mathit{\xdb}}^{\ast}\left(l\right)$ and${n}^{-\frac{1}{2}}\mathit{\xdb}\left(l\right)$ are asymptotically equivalent; and, (III) From II, it is possible to approximate the distribution of S(l) by that of s^{∗}(l) under the null hypothesis[17, 37].
Model selection
The search for QTL effects on phenotypic traits consists on identifying those subset of genomic regions for which statistical tests are significant.[38] elaborated the problem of finding such a subset of genomic regions as the one of model selection, for which many tools are available in the vast literature of variable selection. However, in QTL studies the identification of a reasonable model, which maximizes the correct number of QTL while controlling the rate of false discovery is predominant over the identification of models with the smallest prediction errors, which is the major criterion for model selection[38].
The score-based threshold can be used as a criterion to build and refine models with many QTL. Starting with a model with no QTL effect we can select putative QTL and refine the model, by including to or excluding from the MTMIM model any effects, all based on their statistical significance assessed via the score-based method. We propose an algorithm, analogue to the algorithm described in[11], to build an initial MTMIM model and to refine it upon using the score-based threshold criterion.
Forward selection
Assuming that model (1) starts with no QTL, one QTL is added at each step of the forward selection. In the m^{tℎ} step of the forward selection, we assume a putative pleiotropic QTL at every position l ∈ ζ (one at the time), but avoiding positions within 5 cM neighboring regions of the m − 1 QTL already in the model and compute the MLE of all parameters. For each position l, we compute the LRT statistic to test the null hypothesis${\text{H}}_{0}:{({\beta}_{1m},{\beta}_{2m},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{\beta}_{\mathrm{Tm}})}^{\prime}={(0,0,\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},0)}^{\prime}$ versus${\text{H}}_{1}:{({\beta}_{1m},{\beta}_{2m},\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},{\beta}_{\mathrm{Tm}})}^{\prime}\ne {(0,0,\xb7\; \xb7\; \xb7\phantom{\rule{0.3em}{0ex}},0)}^{\prime}$. A putative QTL at the position with maximum LRT statistic is added to the model if the LRT statistic is larger than the score-based threshold. Next, the effect of the selected QTL on each trait is tested individually against the null of no effect using the LRT and critical value from a chi-squared probability distribution function with one degree of freedom and pre-specified corrected error rate α_{ c }, i.e., when T traits are analyzed jointly, the corrected significance level (Bonferroni correction) to test each effect of the m^{tℎ} QTL at an error rate α is α_{ c } = α / T. Finally, any nonsignificant effect of the m^{tℎ} QTL is removed from the model, ending the m^{tℎ} step of the forward selection. The forward selection continues until no maximum LRT statistic exceeds the score-based threshold.
Model optimization
In turns, we update the positions of all QTL in the model. We pick a QTL and hold the other QTL fixed at the positions that they were mapped. The effects of the picked QTL are then removed from the model and a new search is done within the region delimited by its two neighboring QTL, avoiding 5 cM from each neighbor (the search is performed until the end of the chromosome if no neighbor QTL is found on either side of the picked QTL). The new position of the picked QTL is set to the position of the maximum LRT statistic within the searched region and all parameters in the model are updated. This procedure is repeated until the positions of all QTL are updated.
Some suitable hypotheses in the MTMIM model
Testing pleiotropic versus closely linked nonpleiotropic QTL
Although testing for pleiotropic versus closely linked nonpleiotropic QTL is a part of model selection, we preferred to separate it from the model selection because in general this test is performed at the end of the model selection procedure, when the final model is almost fitted.
Many hypotheses can be formulated and tested, for example, the hypotheses of model (5) versus (6) can be stated as H_{0} : λ_{3} = λ_{4} versus H_{1} : λ_{3} ≠ λ_{4}, and the hypotheses of model (6) versus (7) can be stated as H_{0} : β_{14} = β_{23} = 0 versus H_{1} : β_{14} ≠ 0 and β_{23} ≠ 0. In general, testing whether QTL r has pleiotropic main effect or not in a subset S (S ∈ T) of traits in the model, means testing H_{0} : β_{ t r } = 0 ∀ T ∈ s versus H_{1} : β_{ t r } ≠ 0 for some T ∈ s. And, testing whether QTL r and l has pleiotropic epistatic effect or not in a subset S (S ∈ T) of traits in the model, means testing H_{0} : w_{ t r l } = 0 ∀ T ∈ s versus H_{1} : w_{ t r l } ≠ 0 for some T ∈ s. Model (6) illustrates a situation in which parameters are constrained to zero and the parameter estimators derived previously in the CM-step with constrained parameters are suitable.
When models are nested, the critical value to assess the strength of the LRT is straightforward, in the sense that under regular conditions the LRT has asymptotic chi-squared distribution with degrees of freedom equal to the difference between the number of parameters in the full and reduced models. However, the pleiotropic and closely linkage models may not be nested (for instance, models (6) and (7)), which then requires some correction for the LRT[39, 40]. The parametric bootstrap method[13] is an alternative for computing the empirical distribution of the LRT statistic in QTL mapping when models are not nested. In recognizing the test of pleiotropic versus closely linked nonpleiotropic QTL as one of model selection, we evaluate the performance of Akaike’s Information Criterion corrected (AICc)[41] and LRT, using simulation.
When a QTL has epistasis, testing this QTL for pleiotropy versus close linkage is not trivial because the test not only depends on the QTL being tested but also on any other QTL in the model that might interact with it. In general, we suggest to search for QTL main effects, and upon finishing this search to test for pleiotropy versus close linkage, and finally to search for epistasis and no longer to test pleiotropy or to test solely those QTL without epistasis.
QTL by environment interaction
The possibility of testing for QTL by environment interaction arises as another advantage of the multiple trait analysis. There are two situations in which we are able to study the differential expression of QTL. First, when the same set of genotypes are evaluated phenotypically in different environments (design I), and second when the phenotypic evaluations are done in different sets of genotypes in different environments (design II)[10]. We regard the model for analysis of data in design II as multiple population model, and thus we shall omit further discussion about it while talking about the multiple trait analysis in this paper.
Let us reiterate that in design I we regard the expression of a trait in different environments as different trait states[42]. Therefore, the index T (T = 1,2,· · · T), which was previously defined to index traits, is regarded as the environment index in what follows. With this in mind, testing whether the main effect of QTL r on a trait is statistically different or not in a subset S (S ∈ T) of environments, means testing H_{0} : β_{ t r } = β_{ r } ∀ T ∈ s versus H_{1} : β_{ t r } ≠ β_{ r } for some T ∈ s. And, testing whether QTL r and l epistatic effect on a trait is statistically different or not in a subset S (S ∈ T) of environments, means testing H_{0} : w_{ t r l } = 0 ∀ T ∈ s versus H_{1} : w_{ t r l } ≠ 0 for some T ∈ s.
The LRT may be used to evaluate the hypotheses above. The cut-off point for the test can be obtained from the chi-squared probability distribution function with degrees of freedom being the difference between the number of parameters in the full (H_{1}) and reduced (H_{0}) models.
Evaluation of the MTMIM model by simulation
We implemented the MTMIM model and score-based threshold method, and evaluated them with several simulated datasets. More specifically, we evaluated type I error, model fitting, and the efficiency of pleiotropic versus closely linked nonpleiotropic QTL testing hypothesis delivered by the MTMIM model.
Genome-wide type I error
Simulated genetic architecture of traits
Effects of each QTL^{ d } | ∑ _{ e } ^{ f } | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Scenario^{ a } | h^{2}^{ b } | u ^{ c } | Q1 | Q2 | Q3 | Q4 | Q5 | T1 | T2 | T3 | |
T1 | 0 | 30 | 0 | 0 | 0 | 0 | 0 | 1 | 0.2 | 0 | |
S0 | T2 | 0 | 35 | 0 | 0 | 0 | 0 | 0 | 0.2 | 1 | -0.2 |
T3 | 0 | 30 | 0 | 0 | 0 | 0 | 0 | 0 | -0.2 | 1 | |
T1 | 25 | 30 | 0.52 | 0.52 | 0.52 | 0.52 | 0.52 | 1 | 0.2 | 0 | |
T2 | 25 | 35 | 0.52 | 0.52 | 0.52 | 0.52 | 0.52 | 0.2 | 1 | -0.2 | |
SI | T3 | 25 | 30 | 0.52 | 0.52 | 0.52 | 0.52 | 0.52 | 0 | -0.2 | 1 |
Chr. | – | – | 1 | 2 | 3 | 5 | 6 | – | – | – | |
Position^{ e } | – | – | 23 | 15 | 45 | 67 | 53 | – | – | – | |
T1 | 25 | 30 | 0.52 | 0.52 | 0.52 | 0.52 | 0.52 | 1 | 0.2 | 0 | |
T2 | 18 | 35 | 0 | 0.54 | 0.54 | 0.54 | 0 | 0.2 | 1 | -0.2 | |
SII | T3 | 5 | 30 | 0 | 0 | 0.46 | 0 | 0 | 0 | -0.2 | 1 |
Chr. | – | – | 1 | 2 | 3 | 5 | 6 | – | – | – | |
Position | – | – | 23 | 15 | 45 | 67 | 53 | – | – | – | |
T1 | 18 | 30 | 0.54 | 0 | 0.54 | 0 | 0.54 | 1 | 0.2 | – | |
T2 | 18 | 35 | 0 | 0.54 | 0.54 | 0.54 | 0 | 0.2 | 1 | – | |
SIII | Chr. | – | – | 1 | 1 | 3 | 6 | 6 | – | – | – |
Position | – | – | 23 | 33 | 45 | 38 | 53 | – | – | – |
Model fit evaluations
We use simulation to evaluate the overall performance of the MTMIM model and score-based threshold as the criterion to assess the significance level of QTL effects in the genome-wide scan. We examined the performance of the MTMIM in three different scenarios (SI, SII and SIII shown in Table8), each evaluated with r = 500 replicates. Each replicate was simulated with six chromosomes, each with nine markers evenly spaced 10 cM apart from each other, and 300 subjects. The genetic architecture of quantitative traits in each scenario is described with details in Table8. For each replicate we build an MTMIM model using our proposed forward selection and model optimization procedure. The genome was partitioned at 1-cM grid for genome-wide scan. For the sake of comparison, we also build an MIM model for each trait in each replicate using our proposed forward selection and model optimization procedure. For every position in the genome, the score statistic was resampled 800 times for the purpose of genome-wide score-based threshold estimation.
The general goal of each simulated scenario is: (SI) With a basic and favorable situation, we want to evaluate basic properties of the MTMIM model; (SII) With a mixture of QTL affecting one, two and three traits, we want to evaluate how well the MTMIM model handles the estimation of QTL with effects on only a subset of traits; (SIII) With presence of closely linked nonpleiotropic QTL and a pleiotropic QTL, we want to evaluate the MTMIM model under more complex genetic architecture. In SIII, we build an MTMIM model for each replicate using the forward selection without testing for pleiotropic versus closely linked nonpleiotropic QTL. Each MTMIM model built in the forward selection was then refined with a follow-up test of pleiotropic versus closely linked nonpleiotropic QTL. The pleiotropic versus closely linked nonpleiotropic test was carried out for every pleiotropic QTL in the MTMIM model.
We evaluated the MTMIM model under three genome-wide significance levels: 1, 5 and 10%. For each replicate, all QTL selected in the forward selection are defined as mapped QTL. We summarize the performance of the MTMIM model with measures that are function of the logarithm of odds ratio (LOD) support interval of mapped QTL. The LOD-d (d = 1, 1.5, and 2) support interval of a mapped QTL is a continuous genomic region that includes the position of the mapped QTL and all positions on its left and right sides with LOD values greater than or equal to the LOD value at the position of the mapped QTL after subtraction of a positive constant d[1]. Let Q_{ r }, for r ∈ {1, 2, · · · m = 5}, be a simulated QTL. A simulated QTL is defined as being paired with a mapped QTL if the simulated and mapped QTL are nearby. A mapped QTL is defined as being matched to a paired QTL if the LOD-d support interval of the mapped QTL includes the paired QTL. A mapped QTL is defined as mismatched if it is not matched. A simulated QTL Q_{ r } is defined as identified if it has a matched QTL. For each simulated Q_{ r } and for each d, let${\Omega}_{{Q}_{r},d}$ be the set of replicates for which Q_{ r } is identified. We define$\left|{\Omega}_{{Q}_{r},d}\right|$ as the number of elements in${\Omega}_{{Q}_{r},d}$. A criterion to match mapped and simulated QTL which uses both LOD-d support interval and closest distance between mapped and simulated QTL is more appropriate than the usual criterion that uses closest distance alone. Our measures of model fit are: (1) False discovery rate per replicate, FDR_{ b }(d), which is the ratio of number of mismatched QTL in replicate b to total number of mapped QTL in replicate b; (2) FDR over all replicates, FDR(d)=$\sum _{b=1}^{R}\mathrm{FD}{R}_{b}\left(d\right)/R$; (3) Power to identify Q_{ r }, Power(Q_{ r }d)=$\left|{\Omega}_{{Q}_{r},d}\right|/R$; (4) LOD-d support interval coverage of Q_{ r }, c(Q_{ r }d), which is the ratio of$\left|{\Omega}_{{Q}_{r},d}\right|$ to the number of replicates for which Q_{ r } is paired with a mapped QTL; (5) Mean length of LOD-d support interval of Q_{ r }, which is the average length of LOD-d support intervals of Q_{ r } over replicates in${\Omega}_{{Q}_{r},d}$; (6) Mean effect of Q_{ r }, which is the average effects of Q_{ r } over replicates in${\Omega}_{{Q}_{r},d}$; (7) Mean position of Q_{ r }, which is the average positions of Q_{ r } over replicates in${\Omega}_{{Q}_{r},d}$; and (8) Model size, which is the number of mapped QTL. These summary statistics have been proposed by C. Laurie, S. Wang, L. A. Carlini-Garcia and Z-B. Zeng (unpublished observations).
Appendix
Parameter estimation
Expectation-conditional maximization algorithm
The CM-step consists of maximizing the expected complete logarithm likelihood function with respect to the unknown parameters through derivatives (see Section Derivatives).
Newton-Raphson method
Generalized EM-Newton-Raphson method
and I is an identity matrix. From (12), we can see that so long as κ^{(v)} is chosen to make (13) positive definite, the logarithm likelihood function is guaranteed to increase at every iteration.
Derivatives
We provide analytical formulae of the first and second order derivatives of the logarithm of individual and overall likelihood functions of data under the MTMIM model. We borrowed useful ideas from[43, 44]. These papers provide many results regarding matrix derivatives as well as their applications in multivariate analysis.
Auxiliary matrices
We assume b = 1, 2, · · · , s, i = 1, 2, · · · ,n and j = 1, 2, · · · ,2^{ m } .
First order derivatives of the logarithm of the individual likelihood function
Second order derivatives of the logarithm of the overall likelihood function
First and second order derivatives of the expected complete-data logarithm likelihood function
Extension to other crosses
The extension of score statistic to other cross types (for instance, intercross F_{2}, recombinant inbred lines, double haploids) is straightforward, in fact, the auxiliary matrices, expressions of first and second order derivatives of the logarithm of individual and overall likelihood functions can be straightly obtained from the general expressions derived previously. For a specific cross type, the extension consists basically of building an appropriate design matrix Z and matrix of parameters$\mathcal{B}\phantom{\rule{0.5em}{0ex}}$, and substituting 2^{ m } in the summations by the appropriate value according to that cross type (for instance, 3^{ m } for intercross F_{2}).
Declarations
Acknowledgements
The authors wish to thank the editor Rongling Wu and the two anonymous reviewers for their valuable comments that improved the presentation of this paper. This work was carried out while L.D.C. E Silva was a Ph.D. candidate in Statistics at the North Carolina State University, with a joint fellowship from the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES - Brazil) and Fulbright.
Authors’ Affiliations
References
- Lander ES, Botstein D: Mapping mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989, 121: 185-199.PubMed CentralPubMedGoogle Scholar
- Da Costa E Silva, Zeng ZB: Current progress on statistical methods for mapping quantitative trait loci from inbred line crosses. J Biopharmaceutical Stat. 2010, 20 (2): 454-481. 10.1080/10543400903572845.View ArticleGoogle Scholar
- Zeng ZB: Theoretical basis for separation of multiple linked gene effects in mapping quantitative trait loci. Proc Natl Acad Sci USA. 1993, 90 (23): 10972-10976. 10.1073/pnas.90.23.10972.PubMed CentralView ArticlePubMedGoogle Scholar
- Zeng ZB: Precision mapping of quantitative trait loci. Genetics. 1994, 136 (4): 1457-1468.PubMed CentralPubMedGoogle Scholar
- Haley CS, Knott SA: A simple regression method for mapping quantitative trait loci in line crosses using flanking markers. Heredity. 1992, 69 (4): 315-324. 10.1038/hdy.1992.131.View ArticlePubMedGoogle Scholar
- Haley CS, Knott SA, Elsen JM: Mapping quantitative trait loci in crosses between outbred lines using least squares. Genetics. 1994, 136 (3): 1195-1207.PubMed CentralPubMedGoogle Scholar
- Kao CH, Zeng ZB, Teasdale RD: Multiple interval mapping for quantitative trait loci. Genetics. 1999, 152 (3): 1203-1216.PubMed CentralPubMedGoogle Scholar
- Satagopan JM, Yandell BS, Newton MA, Osborn TC: A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics. 1996, 144 (2): 805-816.PubMed CentralPubMedGoogle Scholar
- Yi N, Shriner D, Banerjee S, Mehta T, Pomp D, Yandell BS: An efficient Bayesian model selection approach for interacting quantitative trait loci models with many effects. Genetics. 2007, 176 (3): 1865-1877. 10.1534/genetics.107.071365.PubMed CentralView ArticlePubMedGoogle Scholar
- Jiang C, Zeng ZB: Multiple trait analysis of genetic mapping for quantitative trait loci. Genetics. 1995, 140 (3): 1111-1127.PubMed CentralPubMedGoogle Scholar
- Zeng ZB, Kao CH, Basten CJ: Estimating the genetic architecture of quantitative traits. Genet Res. 1999, 74 (3): 279-289. 10.1017/S0016672399004255.View ArticlePubMedGoogle Scholar
- Maia JM: Joint analysis of multiple gene expression traits to map expression quantitative trait loci. PhD thesis,. North Carolina State University 2007Google Scholar
- Knott SA, Haley CS: Multitrait least squares for quantitative trait loci detection. Genetics. 2000, 156 (2): 899-911.PubMed CentralPubMedGoogle Scholar
- Liu J, Liu Y, Liu X, Deng HW: Bayesian Mapping of Quantitative Trait Loci for Multiple Complex Traits with the Use of Variance Components. Am J of Human Genet. 2007, 81: 304-320. 10.1086/519495.View ArticleGoogle Scholar
- Banerjee S, Yandell BS, Yi N: Bayesian quantitative trait loci mapping for multiple traits. Genetics. 2008, 179: 2275-2289. 10.1534/genetics.108.088427.PubMed CentralView ArticlePubMedGoogle Scholar
- Churchill GA, Doerge RW: Empirical threshold values for quantitative trait mapping. Genetics. 1994, 138 (3): 963-971.PubMed CentralPubMedGoogle Scholar
- Zou F, Fine JP, Hu J, Lin DY: An efficient resampling method for assessing genome-wide statistical significance in mapping quantitative trait Loci. Genetics. 2004, 168 (4): 2307-2316. 10.1534/genetics.104.031427.PubMed CentralView ArticlePubMedGoogle Scholar
- Liu J, Mercer JM, Stam LF, Gibson GC, Zeng ZB, Laurie CC: Genetic analysis of a morphological shape difference in the male genitalia of Drosophila simulans and D. mauritiana. Genetics. 1996, 142: 1129-1145.PubMed CentralPubMedGoogle Scholar
- Zeng ZB, Liu J, Stam LF, Kao CH, Mercer JM, Laurie CC: Genetic architecture of a morphological shape difference between two Drosophila species. Genetics. 2000, 154: 299-310.PubMed CentralPubMedGoogle Scholar
- Archive for the genotypic and phenotypic data of the motivating example. [ftp://statgen.ncsu.edu/pub/qtlcart/data/zengetal99/],
- Storey JD, Tibshirani R: Statistical significance for genomewide studies. Proc Natl Acad Sci USA. 2003, 100 (16): 9440-9445. 10.1073/pnas.1530509100.PubMed CentralView ArticlePubMedGoogle Scholar
- Beavis WD: QTL analyses: power, precision, and accuracy. Molecular Dissection of Complex Traits. Edited by: Paterson AH. 1998, New York: CRC Press, 145-162.Google Scholar
- Wang S, Basten CJ, Zeng ZB: Windows QTL Cartographer 2.51. Department of Statistics, North Carolina State University, Raleigh, NC, 2011 [http://statgen.ncsu.edu/qtlcart/WQTLCart.htm],
- Kao CH, Zeng ZB: Modeling epistasis of quantitative trait loci using Cockerham’s model. Genetics. 2002, 160 (3): 1243-1261.PubMed CentralPubMedGoogle Scholar
- Zeng ZB, Wang T, Zou W: Modeling quantitative trait loci and interpretation of models. Genetics. 2005, 169 (3): 1711-1725.PubMed CentralView ArticlePubMedGoogle Scholar
- Zellner A: An efficient method of estimating seemingly unrelated regressions and tests for aggregation bias. J Am Stat Assoc. 1962, 57 (298): 348-368. 10.1080/01621459.1962.10480664.View ArticleGoogle Scholar
- Yi N, Yandell BS, Churchill GA, Allison DB, Eisen EJ, Pomp D: Bayesian model selection for genome-wide epistatic quantitative trait loci analysis. Genetics. 2005, 170 (3): 1333-1344. 10.1534/genetics.104.040386.PubMed CentralView ArticlePubMedGoogle Scholar
- Jiang C, Zeng ZB: Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica. 1997, 101: 47-58. 10.1023/A:1018394410659.View ArticlePubMedGoogle Scholar
- Dempster AP, Laird N, Rubin D: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B (Methodological). 1977, 39: 1-38.Google Scholar
- Rai SN, Matthews DE: Improving the EM algorithm. Biometrics. 1993, 49 (2): 587-591. 10.2307/2532570.View ArticleGoogle Scholar
- Aitkin M, Aitkin I: A hybrid EM/Gauss-Newton algorithm for maximum likelihood in mixture distributions. Stat Comput. 1996, 6: 127-130. 10.1007/BF00162523.View ArticleGoogle Scholar
- McLachlan GJ, Krishnan T: The EM Algorithm and Extensions. 1996, New York: Wiley-InterscienceGoogle Scholar
- Meng XL, Rubin DB: Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika. 1993, 80 (2): 267-278. 10.1093/biomet/80.2.267.View ArticleGoogle Scholar
- Redner RA, Walker HR: Mixiture densities, maximum likelihood and the EM algorithm. SIAM Rev. 1984, 26 (2): 195-239. 10.1137/1026034.View ArticleGoogle Scholar
- Chang MN, Wu R, Wu SS, Casella G: Score statistics for mapping quantitative trait loci. Stat Appl Genet Mol Biol. 2009, 8: 1-35. [Article 16]Google Scholar
- Cox DR, Hinkley DV: Theoretical Statistics. 1974, London: Chapman and HallView ArticleGoogle Scholar
- Lin D: An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics. 2005, 21 (6): 781-787. 10.1093/bioinformatics/bti053.View ArticlePubMedGoogle Scholar
- Broman KW, Speed TP: A model selection approach for the identification of quantitative trait loci in experimental crosses. J R Stat Soc Series B (Statl Methodology). 2002, 64 (4): 641-656. 10.1111/1467-9868.00354.View ArticleGoogle Scholar
- Vuong QH: Likelihood ratio tests for model selection and non-nested hypotheses. Econometrica. 1989, 57 (2): 307-333. 10.2307/1912557.View ArticleGoogle Scholar
- Kapetanios G, Weeks M: Non-nested models and the likelihood ratio statistic: a comparison of simulation and bootstrap based tests. Technical report: University of London, 2003Google Scholar
- Sugiura N: Further analysts of the data by Akaike’s information criterion and the finite corrections. Commun Stat - Theory and Methods. 1978, 7: 13-26. 10.1080/03610927808827599.View ArticleGoogle Scholar
- Falconer DS: The problem of environment and selection. Am Nat. 1952, 86: 293-298. 10.1086/281736.View ArticleGoogle Scholar
- Dwyer PS, Macphail MS: Symbolic matrix derivatives. Ann Math Stat. 1948, 19 (4): 517-534. 10.1214/aoms/1177730148.View ArticleGoogle Scholar
- Dwyer PS: Some applications of matrix derivatives in multivariate analysis. J Am Stat Assoc. 1967, 62 (318): 607-625. 10.1080/01621459.1967.10482934.View ArticleGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.