- Methodology article
- Open Access
Mapping quantitative trait loci in line cross with repeat records
- Runqing Yang^{1}Email author and
- Ming Fang^{2}
https://doi.org/10.1186/1471-2156-8-47
© Yang and Ming; licensee BioMed Central Ltd. 2007
- Received: 18 August 2006
- Accepted: 12 July 2007
- Published: 12 July 2007
Abstract
Background
Phenotypes with repeat records from one individual or multiple individuals were often encountered in practices of mapping QTL in linecross. The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL.
Results
We propose to map QTL by using the repeatability model to directly analyze the repeat records rather than simply analyze the mean phenotype, improving the efficiency of QTL detecting because of adequately utilizing the information from data and allowing for the permanent environmental effects. A maximum likelihood method implemented via the expectation-maximization (EM) algorithm is applied to perform the parameter estimation of the repeatability model. The superiority of the mapping method based on the repeatability model over simple analysis using the mean phenotype was demonstrated by a series of simulations.
Conclusion
Our results suggest that the proposed method can serve as a powerful alternative to existing methods. By mean of the repeatability model, utilizing the repeat records on individual may improve the efficiency of QTL detecting in line cross.
Keywords
- Residual Error
- Mapping Quantitative Trait Locus
- Dominance Effect
- Conditional Density
- Likelihood Ratio Statistic
Background
Replication is the fundamental of the experimental design, the important advantages of which are that it allows for an estimate of experimental error and increases the reliability of information obtained at each experimental point [1, 2]. Replication denotes sampling or measuring multiple times under the same experimental condition (within one treatment), where the experimental unit may be either one individual or multiple individuals with the identical genetic background.
Often plants or animals are observed more than once for a particular trait. For examples, fleece weight of sheep in different years, blood pressure and pulse of a human over time, litter size of sows over time, antler size of deer in different seasons, racing results of horses from several races, exam scores of students during university and so on. These records observed belong to replicate ones if they are not influenced by the measuring environments, such as the years, seasons, parities, races.
In classical quantitative genetics, a trait with repeat records is generally analysed by means of the repeatability model [3, 4], in which, there is an additional permanent environmental effect besides an individual's additive genetic value for a trait. The permanent environmental effect as a measure of the differences among experimental units, is a non-genetic effect common to all observations on the same individual [5]. Such environmental effects are usually accounted for in the model to ensure accurate prediction of breeding values [4]. However, the repeatability model has not been paid adequate attention to mapping QTL by using data with repeat records.
The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records [6, 7]. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL, although it enables to improve the power of detecting QTL with a certain extent.
In this study, we apply the repeatability model to mapping quantitative trait loci with repeat records and demonstrate the higher efficiency of this model by the simulations.
Theory and methods
Mapping QTL based on the mean phenotype
Take a simple F_{2} population of size n derived from two homozygous lines as an example. There are the three possible genotypes denoted by Q_{1}Q_{1}, Q_{1}Q_{2}, and Q_{2}Q_{2}, respectively, at a quantitative trait locus Q. The phenotypic value of an individual i is usually described by the following linear model,
y_{ i }= μ + z_{ i }a + w_{ i }d + e_{ i },
and the variable with additional subscript j indicates the corresponding variable for the j th record of the i th F_{2} individual. The residual error now follows a N(0, σ^{2}/m_{ i }) distribution, given that e_{ ij }~ N(0, σ^{2}).
Because ${p}_{ik}^{\ast}$ is a function of the unknown parameters, iterations are required for EM algorithm. The iterations are described as
Step 0: Set up initials for θ^{(0)}.
Step 1: Calculate the posterior probabilities ${p}_{ik}^{\ast}$ with equation (7).
Step 4: Go to step 1, which complete one round of iteration.
Mapping QTL based on the repeatability model
Partitioning residual error e_{ i }in model (1) into an individual-specific permanent environmental effect ζ_{ i }and random environmental effect ε_{ ij }, the j th phenotypic value of an individual i is represented as
y_{ ij }= μ + z_{ i }a + w_{ i }d + ζ_{ i }+ ε _{ ij }
This is a mixed effects model, also called repeatability model, with a and d being treated as the fixed effects and p_{ i }as the random effect. i.i.d. N(0, ${\sigma}_{\zeta}^{2}$) distribution and ε _{ ij }i.i.d. N(0, ${\sigma}_{\epsilon}^{2}$) distribution.
We use an m_{ i }× 1 vector y_{ i }= [y_{i 1}y_{i 2}… y_{ im }]^{ T }, for n = 1, 2, …, n to denote the array of phenotypic values for the i th individual and define ϕ_{ i }= [1 1 … 1]^{ T }as a vector of dimension m_{ i }. In matrix notation, model (9) can be written as
y_{ i }= ϕ_{ i }μ + z_{ i }ϕ_{ i }a + w_{ i }ϕ_{ i }d + ϕ_{ i }ζ_{ i }+ ε_{ i } (11)
where ε_{ i }= [ε_{i 1}ε_{i 2} … ε_{ im }]^{ T }is an m_{ i }× 1 vector for the random environmental effects which follows N(0, I_{ i }, ${\sigma}_{\epsilon}^{2}$) with I_{ i }being an (m_{ i }× 1) × (m_{ i }× 1) identity matrix. The conditional expectation of model (11) given the fixed effects is
E(y_{ i }) = M_{ i }= ϕ_{ i }μ + z_{ i }ϕ_{ i }a + w_{ i }ϕ_{ i }d (12)
which applies to all i = 1, 2, …, n.
so, we can simply utilize existing mixed model EM algorithm to find the MLE of parameters [9]. Followings are the EM steps for the mixed model analysis.
Step 0: Initialize all parameters with values in their legal domain, denoted by θ^{(0)}.
Step 2: Compute all the expectations involved in the following maximization steps (same with the equation (8)).
Step 4: Update the population mean, additive effect and dominance effect by equation (16). The resulting equations are equivalent to equations (9) replacing m_{ i }with ${\phi}_{}^{T}{V}_{i}^{-1}{\phi}_{}$.
Step7: Repeat from step 1 to step 6 until a certain convergence criterion is reached.
MLE of parameters in both model (2) and (10) are iteratively solved at specific location on chromosomes using EM algorithm and the QTL position and effects are determined by means of likelihood ratio statistics in chromosome or genome scanning.
Simulation studies
A series of simulation experiments were used to compare the efficiency and behaviour of two mapping methods based on the repeatability model with simple analysis using the mean phenotype for a trait with repeat records. We simulated a single chromosome of 100 cM long with 11 evenly spaced codominant markers for an F2 population with sample size n = 100 and a single QTL was put at position 25 cM (between markers 3 and 4). Under the null model, the QTL was assigned a value of zero for both the additive and dominance effects. The empirical critical values of likelihood ratio statistics for testing the presence of the QTL were obtained by simulating 1000 replicates. Under the alternative model, nonzero and equal additive and dominance effects were simulated. The simulations were replicated 100 times. Empirical power was calculated by counting the number of runs in which test statistics were greater than the critical values.
Factor considered include the QTL size, measured as the proportion of the phenotypic variance explained by the QTL (also called the QTL heritability), the number of replicates and ${\sigma}_{\zeta}^{2}$:${\sigma}_{\epsilon}^{2}$ i.e the variance ratio of permanent environmental effect to random environmental effect. The QTL size was set at three levels: a = d = 0.265, 0.577, 0.943 correspond to the three levels of h^{2} = 0.05, 0.10, 0.20 respectively. The number of replicates was examined at five levels: m = 1, 3, 5, 10, 15, and ${\sigma}_{\zeta}^{2}$: ${\sigma}_{\epsilon}^{2}$ = 1:4, 2:3, 2.5:2.5, 3:2, 4:1, remaining ${\sigma}_{\zeta}^{2}$ + ${\sigma}_{\epsilon}^{2}$ = 5.0.
The j th phenotypic value of individual i was simulated by using the repeatability model:
y_{ ij }= μ + z_{ i }a + w_{ i }d + ξ_{ i }σ_{ ζ }+ η_{ ij }σ_{ ε } (25)
Where both ξ_{ i }and η _{ ij }are the random numbers from standard normal distribution.
The results of all simulations consistently show that under the same experimental condition, (1) using the repeatability model can significantly increase the statistical power of QTL detecting compared with simple analysis using the mean phenotype, (2) the position and effects of QTL, especially the proportion of phenotypic variance contributed by QTL were more accuracy estimated by using the repeatability model than using the genetic mapping model without permanent environmental effects to analyze mean phenotype. The superiority of the repeatability model over the simple analysis using the mean phenotype performs in evidence under the condition of the low QTL heritability.
Effects of the number of replications on the mapping analysis based on the repeatability model
Estimate | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
a and d | h ^{2} | Replicate | Power | Position | a | d | h ^{2} | ${\sigma}_{p}^{2}$ | ${\sigma}_{\epsilon}^{2}$ | LOD |
0.4189 | 0.05 | 3 | 37 | 28.65(1.946) | 0.615(0.061) | 0.594(0.028) | 0.116(0.007) | 2.183(0.069) | 2.544(0.046) | 15.31(0.810) |
5 | 46 | 27.63(1.506) | 0.626(0.291) | 0.545(0.226) | 0.108(0.041) | 2.224(0.050) | 2.555(0.018) | 15.39(0.493) | ||
10 | 63 | 26.55(0.923) | 0.698(0.224) | 0.661(0.178) | 0.134(0.042) | 2.375(0.037) | 2.548(0.014) | 18.85(0.535) | ||
15 | 86 | 26.71(1.441) | 0.496(0.028) | 0.544(0.190) | 0.093(0.003) | 2.294(0.035) | 2.532(0.010) | 14.85(0.386) | ||
0.6086 | 0.10 | 3 | 80 | 25.98(0.635) | 0.724(0.036) | 0.670(0.025) | 0.142(0.006) | 2.239(0.049) | 2.605(0.030) | 18.18(0.635) |
5 | 83 | 26.55(0.922) | 0.699(0.022) | 0.661(0.018) | 0.134(0.042) | 2.375(0.034) | 2.548(0.014) | 18.85(0.535) | ||
10 | 87 | 26.43(0.640) | 0.650(0.024) | 0.668(0.017) | 0.130(0.004) | 2.310(0.029) | 2.547(0.010) | 20.09(0.538) | ||
15 | 93 | 25.78(0.678) | 0.637(0.232) | 0.632(0.138) | 0.119(0.037) | 2.412(0.028) | 2.537(0.078) | 18.77(0.515) | ||
0.9129 | 0.20 | 3 | 99 | 24.04(0.507) | 0.937(0.038) | 0.960(0.023) | 0.223(0.007) | 2.322(0.051) | 2.643(0.028) | 29.25(0.928) |
5 | 99 | 24.89(0.319) | 0.881(0.027) | 0.922(0.015) | 0.207(0.005) | 2.374(0.024) | 2.636(0.013) | 29.12(0.640) | ||
10 | 100 | 24.90(0.330) | 0.949(0.025) | 0.917(0.157) | 0.213(0.005) | 2.397(0.027) | 2.573(0.010) | 32.86(0.655) | ||
15 | 100 | 25.16(0.359) | 0.914(0.241) | 0.884(0.014) | 0.120(0.005) | 2.473(0.030) | 2.569(0.008) | 31.26(0.651) |
Effects of the number of replications on the simple analysis using the mean phenotype
Estimate | |||||||||
---|---|---|---|---|---|---|---|---|---|
a and d | h ^{2} | Replicate | Power | Position | a | d | h ^{2} | σ ^{ 2 } | LOD |
0.4189 | 0.05 | 1 | 21 | 38.36(5.188) | 0.673(0.101) | 0.505(0.097) | 0.096(0.004) | 4.266(0.139) | 17.00(0.812) |
3 | 34 | 26.65(0.888) | 0.559(0.317) | 0.560(0.207) | 0.197(0.007) | 2.189 (0.034) | 15.39(0.493) | ||
5 | 42 | 29.50(1.026) | 0.653(0.294) | 0.541(0.296) | 0.178(0.059) | 2.743(0.512) | 17.34(0.529) | ||
10 | 56 | 27.04(0.954) | 0.719(0.241) | 0.694(0.180) | 0.218(0.061) | 2.911(0.337) | 20.92(0.559) | ||
15 | 81 | 26.16(1.521) | 0.496(0.029) | 0.560(0.021) | 0.173(0.005) | 2.469(0.038) | 16.87(0.416) | ||
0.6086 | 0.10 | 1 | 57 | 23.89(1.774) | 0.767(0.050) | 0.777(0.040) | 0.120(0.039) | 4.785(0.082) | 17.22(0.606) |
3 | 78 | 25.39(0.660) | 0.661(0.024) | 0.639(0.016) | 0.234(0.063) | 2.256(0.027) | 23.31(0.531) | ||
5 | 81 | 27.04(0.954) | 0.719(0.241) | 0.694(0.018) | 0.219(0.061) | 2.911(0.034) | 20.92(0.559) | ||
10 | 84 | 26.23(0.602) | 0.667(0.245) | 0.683(0.167) | 0.223(0.065) | 2.600(0.279) | 21.65(0.564) | ||
15 | 87 | 25.79(0.672) | 0.652(0.233) | 0.647(0.147) | 0.211(0.060) | 2.586(0.026) | 20.77(0.529) | ||
0.9129 | 0.20 | 1 | 97 | 25.21(0.563) | 1.003(0.043) | 0.970(0.030) | 0.208(0.005) | 4.800(0.082) | 23.44(0.725) |
3 | 100 | 25.10(0.302) | 0.909(0.233) | 0.916(0.015) | 0.357(0.007) | 2.311(0.025) | 38.04(0.773) | ||
5 | 99 | 25.00(0.305) | 0.886(0.027) | 0.930(0.016) | 0.306(0.007) | 2.974(0.033) | 30.93(0.653) | ||
10 | 100 | 25.08(0.307) | 0.952(0.026) | 0.932(0.016) | 0.335(0.007) | 2.689(0.027) | 34.94(0.673) | ||
15 | 99 | 25.07(0.288) | 0.929(0.025) | 0.914(0.015) | 0.330(0.007) | 2.659(0.026) | 33.65(0.678) |
Comparisons of the mapping analysis based on the repeatability model with the simple analysis using the mean phenotype under the conditions of different the variance ratios of permanent environmental effects to random environmental effects
Estimate | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
h^{2} | ${\sigma}_{p}^{2}$:${\sigma}_{\epsilon}^{2}$ | Method | Power | Position | a | d | h ^{2} | ${\sigma}_{p}^{2}$ | ${\sigma}_{\epsilon}^{2}$ or σ^{ 2 } | LOD |
0.05 | 1: 4 | Repeat | 72 | 25.59(0.919) | 0.471(0.021) | 0.494(0.013) | 0.076(0.003) | 0.889(0.022) | 4.080(0.025) | 17.27(0.484) |
Mean | 60 | 25.52(0.870) | 0.491(0.024) | 0.507(0.015) | 0.199(0.006) | 1.711(0.234) | 19.68(0.525) | |||
2: 3 | Repeat | 44 | 27.59(1.662) | 0.544(0.032) | 0.540(0.022) | 0.098(0.006) | 1.776(0.039) | 3.023(0.024) | 15.79(0.495) | |
Mean | 42 | 26.47(1.415) | 0.549(0.030) | 0.527(0.027) | 0.177(0.006) | 2.418(0.039) | 17.39(0.500) | |||
3: 2 | Repeat | 38 | 24.97(1.441) | 0.607(0.034) | 0.576(0.023) | 0.110(0.004) | 2.745(0.060) | 2.048(0.018) | 14.54(0.389) | |
Mean | 37 | 25.76(1.456) | 0.601(0.034) | 0.585(0.026) | 0.162(0.006) | 3.158(0.054) | 16.16(0.452) | |||
4: 1 | Repeat | 33 | 30.57(2.141) | 0.668(0.050) | 0.558(0.035) | 0.128(0.063) | 3.604(0.067) | 1.007(0.010) | 14.04(0.437) | |
Mean | 26 | 30.62(2.321) | 0.717(0.051) | 0.598(0.043) | 0.171(0.007) | 3.694(0.071) | 16.74(0.570) | |||
0.10 | 1: 4 | Repeat | 97 | 25.01(0.408) | 0.643(0.019) | 0.622(0.013) | 0.115(0.036) | 0.917(0.023) | 4.042(0.019) | 25.05(0.586) |
Mean | 94 | 24.93(0.411) | 0.648(0.020) | 0.628(0.013) | 0.267(0.007) | 1.765(0.023) | 26.56(0.600) | |||
2: 3 | Repeat | 86 | 26.32(0.815) | 0.667(0.022) | 0.635(0.014) | 0.122(0.033) | 1.890(0.030) | 3.085(0.015) | 19.28(0.440) | |
Mean | 84 | 26.34(0.827) | 0.669(0.023) | 0.643(0.014) | 0.215(0.054) | 2.518(0.029) | 20.80(0.459) | |||
3: 2 | Repeat | 83 | 25.53(0.612) | 0.655(0.028) | 0.679(0.017) | 0.137(0.004) | 2.718(0.035) | 2.079(0.013) | 17.76(0.422) | |
Mean | 83 | 25.73(0.750) | 0.659(0.029) | 0.689(0.018) | 0.199(0.006) | 3.145(0.033) | 19.37(0.451) | |||
4: 1 | Repeat | 64 | 25.14(1.043) | 0.703(0.029) | 0.686(0.018) | 0.143(0.004) | 3.812(0.051) | 1.007(0.007) | 16.27(0.430) | |
Mean | 61 | 25.44(0.997) | 0.725(0.032) | 0.751(0.022) | 0.192(0.006) | 3.898(0.051) | 18.32(0.437) |
Discussion
For a trait with repeat records, we proposed use of the repeatability model to map QTL, which distinguishes from simple analysis using the mean phenotype not only in the data analyzed but essentially in the model adopted. Simple analysis using the mean phenotype was based on regular genetic model for mapping QTL in linecross, which excluded the permanent environmental effects. The excluded permanent environmental effects were deposited to the residual error, decreasing the accuracy of estimation for QTL parameters, which was strictly proved in the relevant books to statistic models [e.g., [10, 11]]. Of course, the loss of data information has also influenced the performance of mapping QTL based on the mean phenotype.
Replication required either the experimental conditions must be the same when multiple records were observed only from one individual or the genetic backgrounds must be the identical for each individual while those records were from multiple individuals. If the former was not satisfied, then such "repeat" records observed became longitudinal data, such as test-day records of milk production and body weight in cattle, were genetically analysed using the random regresion model which is essentially the repeatability model nested submodels of time [12–14]. Besides cloned individuals and progencies from each plant in RIL, the later was hard to be satisfied. For example, there were incompletely same genetic backgrounds among individuals within a family and F3 progenies from one F2 individual. To improve the efficiency of detecting QTL using such data, the genetic backgrounds should be at least taken into account in the analysis [7], furthermore, the repeatability model may be a good choice for directly analyzing such "repeat" records.
Although we demonstrate the statistical method of QTL mapping using a F_{2} population as an example, other more simple or complex designs, such as backcross population and full-sib family can also be extended. Assuming only one QTL in the model considered here is to conveniently investigate efficiency of presented method based on various estimates. If a trait is controlled by multiple loci, the composite interval mapping [15, 16] or Bayesian mapping [e.g., [17, 18]] will be proposed for mapping those QTLs by incorporating marker-cofactors outside the scanning interval or all the QTLs into the model (9).
Declarations
Authors’ Affiliations
References
- Fisher RA: The design of experiments. 1971, New York, Hafner Publishing Company, 9Google Scholar
- Steel RGD, Torrie JH: Principles and procedures of statistics: a biometrical approach. 1980, Tokyo, McGraw-Hill Kogakusha, 2Google Scholar
- Henderson CR: Applications of Linear Models in Animal Breeding. 1984, Guelph ON Univ of GuelphGoogle Scholar
- Mrode RA: Linear Models for the Prediction of Animal Breeding Values. 1996, UK, CAB InternationalGoogle Scholar
- Falconer DS: Introduction to Quantitative Genetics. 1960, London,Oliver & BoydGoogle Scholar
- Zhang TY, Yuan J, Yu W, Guo Z, Kohel RJ: Molecular tagging of a major QTL for fiber strong in upland cotton and its marker-assisted selection. Theor Appl Genet. 2003, 106: 262-268.PubMedGoogle Scholar
- Zhang Y, Xu S: Mapping Quantitative Trait Loci in F2 Incorporating Phenotypes of F3 Progeny. Genetics. 2004, 166: 1981-1993. 10.1534/genetics.166.4.1981.PubMed CentralView ArticlePubMedGoogle Scholar
- Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B. 1977, 39: 1-38.Google Scholar
- Henderson CR: Recent developments in variance and covariance estimation. J Anim Sci. 1986, 63: 208-216.Google Scholar
- Zar JH: Biostatistical Analysis. 1996, Prentice Hall, 3Google Scholar
- Neter J, Kutner MH, Nachtsheim CJ, Wasserman W: Applied Linear Statistical Models. 1996, RD Irwin, Homewood, IL, 4Google Scholar
- Henderson CR: Analysis of covariance in the mixed model: Higher level, no homogenous, and random regressions. Biometrics. 1982, 38: 623-640. 10.2307/2530044.View ArticlePubMedGoogle Scholar
- Schaeffer LR: Application of random regression model in animal breeding. Livest Prod Sci. 2004, 86: 35-45. 10.1016/S0301-6226(03)00151-9.View ArticleGoogle Scholar
- Macgregor S, Knott SA, White I, Visscher PM: Quantitative trait locus analysis of longitudinal quantitative trait data in complex pedigrees. Genetics. 2005, 171: 1365-1376. 10.1534/genetics.105.043828.PubMed CentralView ArticlePubMedGoogle Scholar
- Jansen RC: Controlling the type I and type II errors in mapping quantitative trait loci. Genetics. 1994, 138: 871-881.PubMed CentralPubMedGoogle Scholar
- Zeng ZB: Precision mapping of quantitative trait loci. Genetics. 1994, 136: 1457-1468.PubMed CentralPubMedGoogle Scholar
- Satagopan JM, Yandell BS, Newton MA, Osborn TC: A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics. 1996, 144: 805-816.PubMed CentralPubMedGoogle Scholar
- Yi N, Xu S: Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics. 2001, 157: 1759-1771.PubMed CentralPubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.