Mapping quantitative trait loci in line cross with repeat records

Yang, Runqing; Fang, Ming

doi:10.1186/1471-2156-8-47

Methodology article
Open access
Published: 12 July 2007

Mapping quantitative trait loci in line cross with repeat records

Runqing Yang¹ &
Ming Fang²

BMC Genetics volume 8, Article number: 47 (2007) Cite this article

3256 Accesses
1 Citations
Metrics details

Abstract

Background

Phenotypes with repeat records from one individual or multiple individuals were often encountered in practices of mapping QTL in linecross. The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL.

Results

We propose to map QTL by using the repeatability model to directly analyze the repeat records rather than simply analyze the mean phenotype, improving the efficiency of QTL detecting because of adequately utilizing the information from data and allowing for the permanent environmental effects. A maximum likelihood method implemented via the expectation-maximization (EM) algorithm is applied to perform the parameter estimation of the repeatability model. The superiority of the mapping method based on the repeatability model over simple analysis using the mean phenotype was demonstrated by a series of simulations.

Conclusion

Our results suggest that the proposed method can serve as a powerful alternative to existing methods. By mean of the repeatability model, utilizing the repeat records on individual may improve the efficiency of QTL detecting in line cross.

Background

Replication is the fundamental of the experimental design, the important advantages of which are that it allows for an estimate of experimental error and increases the reliability of information obtained at each experimental point [1, 2]. Replication denotes sampling or measuring multiple times under the same experimental condition (within one treatment), where the experimental unit may be either one individual or multiple individuals with the identical genetic background.

Often plants or animals are observed more than once for a particular trait. For examples, fleece weight of sheep in different years, blood pressure and pulse of a human over time, litter size of sows over time, antler size of deer in different seasons, racing results of horses from several races, exam scores of students during university and so on. These records observed belong to replicate ones if they are not influenced by the measuring environments, such as the years, seasons, parities, races.

In classical quantitative genetics, a trait with repeat records is generally analysed by means of the repeatability model [3, 4], in which, there is an additional permanent environmental effect besides an individual's additive genetic value for a trait. The permanent environmental effect as a measure of the differences among experimental units, is a non-genetic effect common to all observations on the same individual [5]. Such environmental effects are usually accounted for in the model to ensure accurate prediction of breeding values [4]. However, the repeatability model has not been paid adequate attention to mapping QTL by using data with repeat records.

The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records [6, 7]. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL, although it enables to improve the power of detecting QTL with a certain extent.

In this study, we apply the repeatability model to mapping quantitative trait loci with repeat records and demonstrate the higher efficiency of this model by the simulations.

Theory and methods

Mapping QTL based on the mean phenotype

Take a simple F₂ population of size n derived from two homozygous lines as an example. There are the three possible genotypes denoted by Q₁Q₁, Q₁Q₂, and Q₂Q₂, respectively, at a quantitative trait locus Q. The phenotypic value of an individual i is usually described by the following linear model,

y_i= μ + z_ia + w_id + e_i,

Where μ is the population mean, a and d are additive and dominant effects of the QTL, e_iis the residual error with a N(0, σ²) distribution, and

\begin{matrix} z_{i} = {\begin{matrix} + 1 & f o r Q_{1} Q_{1} \\ 0 & f o r Q_{1} Q_{2} \\ - 1 & f o r Q_{2} Q_{2} \end{matrix} & and & w_{i} = {\begin{matrix} - 1 & f o r Q_{1} Q_{1} \\ + 1 & f o r Q_{1} Q_{2} \\ - 1 & f o r Q_{2} Q_{2} \end{matrix} \end{matrix} .

If m_irecords are repeatedly sampled from each individual and the phenotypic value of an individual i is measured by the average of m_irecords, the model is modified as

{\bar{y}}_{i} = μ + z_{i} a + w_{i} d + {\bar{e}}_{i},

(2)

where

{\bar{y}}_{i} = \frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} y_{i j}, {\bar{e}}_{i} = \frac{1}{m_{i}} \sum_{j = 1}^{m_{i}} e_{i j},

and the variable with additional subscript j indicates the corresponding variable for the j th record of the i th F₂ individual. The residual error now follows a N(0, σ²/m_i) distribution, given that e_ij~ N(0, σ²).

Let

f ({\bar{y}}_{i} | θ, z_{i}, w_{i}) = \sqrt{\frac{m_{i}}{2 π σ}} \exp [- \frac{m_{i}}{2 σ^{2}} {({\bar{y}}_{i} - μ - z_{i} a - w_{i} d)}^{2}]

(3)

be the conditional density of ${\bar{y}}_{i}$ , where θ = [μ a d σ²]^Tare the parameters; the log likelihood function defined under the missing variables z_iand w_iis

L (θ) = \sum_{i = 1}^{n} \ln {E [f ({\bar{y}}_{i} | θ, z_{i}, w_{i})]}

(4)

The expectation-maximization (EM) algorithm [8] can be used to obtain the MLE, as shown below,

[\begin{matrix} \sum_{i = 1}^{n} m_{i} & \sum_{i = 1}^{n} m_{i} E (z_{i}) & \sum_{i = 1}^{n} m_{i} E (w_{i}) \\ \sum_{i = 1}^{n} m_{i} E (z_{i}) & \sum_{i = 1}^{n} m_{i} E (z_{i}^{2}) & \sum_{i = 1}^{n} m_{i} E (z_{i} w_{i}) \\ \sum_{i = 1}^{n} m_{i} E (w_{i}) & \sum_{i = 1}^{n} m_{i} E (z_{i} w_{i}) & \sum_{i = 1}^{n} m_{i} E (w_{i}^{2}) \end{matrix}] [\begin{matrix} μ \\ a \\ d \end{matrix}] = [\begin{matrix} \sum_{i = 1}^{n} m_{i} {\bar{y}}_{i} \\ \sum_{i = 1}^{n} m_{i} E (z_{i}) {\bar{y}}_{i} \\ \sum_{i = 1}^{n} m_{i} E (w_{i}) {\bar{y}}_{i} \end{matrix}]

(5)

and

σ^{2} = {[\sum_{i = 1}^{n} m_{i}]}^{- 1} \sum_{i = 1}^{n} m_{i} E [{({\bar{y}}_{i} - μ - z_{i} a - w_{i} d)}^{2}]

(6)

The expectation shown in Equation 6 can be further expressed as

\begin{array}{l} E {[{\bar{y}}_{i} - μ - z_{i} a - w_{i} d)}^{2}] \\ = {({\bar{y}}_{i} - μ)}^{2} + a^{2} E (z_{i}^{2}) + d^{2} E (w_{i}^{2}) - 2 ({\bar{y}}_{i} - μ) [a E (z_{i}) + d E (w_{i})] + 2 a d E (z_{i} w_{i}) \end{array}

Define the posterior probabilities of the three QTL genotypes for j th individual as

\begin{matrix} p_{i k}^{*} = \frac{p_{i k} f ({\bar{y}}_{i} | θ, z_{i}, w_{i})}{\sum_{k = 1}^{3} p_{i k} f ({\bar{y}}_{i} | θ, z_{i}, w_{i})} & (k = 1, 2, 3), \end{matrix}

(7)

where p_ikare the conditional probabilities inferred by marker information, then

\begin{array}{l} E (z_{i}) = p_{i 1}^{*} - p_{i 3}^{*}, E (z_{i}^{2}) = p_{i 1}^{*} + p_{i 3}^{*}, \\ E (w_{i}) = p_{i 2}^{*} - p_{i 1}^{*} - p_{i 3}^{*}, E (w_{i}^{2}) = 1 and E (z_{i} w_{i}) = p_{i 3}^{*} - p_{i 1}^{*} . \end{array}

(8)

Because $p_{i k}^{*}$ is a function of the unknown parameters, iterations are required for EM algorithm. The iterations are described as

Step 0: Set up initials for θ⁽⁰⁾.

Step 1: Calculate the posterior probabilities $p_{i k}^{*}$ with equation (7).

Step 2: Substituting (8) into equation (5), estimate

\begin{array}{l} μ^{(1)} = {[\sum_{i = 1}^{n} m_{i}]}^{- 1} [\sum_{i = 1}^{n} m_{i} {\bar{y}}_{i} - a^{(0)} \sum_{i = 1}^{n} m_{i} (p_{i 1}^{*} - p_{i 3}^{*}) - d^{(0)} \sum_{i = 1}^{n} m_{i} (p_{i 2}^{*} - p_{i 1}^{*} - p_{i 3}^{*})] \\ a^{(1)} = {[\sum_{i = 1}^{n} m_{i} (p_{i 1}^{*} + p_{i 3}^{*})]}^{- 1} [\sum_{i = 1}^{n} m_{i} (p_{i 1}^{*} - p_{i 3}^{*}) {\bar{y}}_{i} - μ^{(1)} \sum_{i = 1}^{n} m_{i} (p_{i 1}^{*} - p_{i 3}^{*}) - d^{(0)} \sum_{i = 1}^{n} m_{i} (p_{i 3}^{*} - p_{i 1}^{*})] \\ d^{(1)} = {[\sum_{i = 1}^{n} m_{i}]}^{- 1} [\sum_{i = 1}^{n} m_{i} (p_{i 2}^{*} - p_{i 1}^{*} - p_{i 3}^{*}) {\bar{y}}_{i} - μ^{(1)} \sum_{i = 1}^{n} m_{i} (p_{i 2}^{*} - p_{i 1}^{*} - p_{i 3}^{*}) - a^{(1)} \sum_{i = 1}^{n} m_{i} (p_{i 3}^{*} - p_{i 1}^{*})] \end{array}

(9)

Step 3: Substituting (8) into equation (6), estimate

σ^{2 (1)} = {[\sum_{i = 1}^{n} m_{i}]}^{- 1} {\sum_{i = 1}^{n} m_{i} [p_{i 1}^{*} {({\bar{y}}_{i} - μ^{(1)} - a^{(1)} + d^{(1)})}^{2} + p_{i 2}^{*} {({\bar{y}}_{i} - μ^{(1)} - d^{(1)})}^{2} + p_{i 3}^{*} {({\bar{y}}_{i} - μ^{(1)} + a^{(1)} + d^{(1)})}^{2}]}

Step 4: Go to step 1, which complete one round of iteration.

Mapping QTL based on the repeatability model

Partitioning residual error e_iin model (1) into an individual-specific permanent environmental effect ζ_iand random environmental effect ε_ij, the j th phenotypic value of an individual i is represented as

y_ij= μ + z_ia + w_id + ζ_i+ ε _ij

This is a mixed effects model, also called repeatability model, with a and d being treated as the fixed effects and p_ias the random effect. i.i.d. N(0, $σ_{ζ}^{2}$ ) distribution and ε _iji.i.d. N(0, $σ_{ε}^{2}$ ) distribution.

We use an m_i× 1 vector y_i= [y_{i 1}y_{i 2}… y_im]^T, for n = 1, 2, …, n to denote the array of phenotypic values for the i th individual and define ϕ_i= [1 1 … 1]^Tas a vector of dimension m_i. In matrix notation, model (9) can be written as

y_i= ϕ_iμ + z_iϕ_ia + w_iϕ_id + ϕ_iζ_i+ ε_i (11)

where ε_i= [ε_{i 1}ε_{i 2} … ε_im]^Tis an m_i× 1 vector for the random environmental effects which follows N(0, I_i, $σ_{ε}^{2}$ ) with I_ibeing an (m_i× 1) × (m_i× 1) identity matrix. The conditional expectation of model (11) given the fixed effects is

E(y_i) = M_i= ϕ_iμ + z_iϕ_ia + w_iϕ_id (12)

and the variance-covariance matrix is

V a r (y_{i}) = V_{i} = φ_{i} φ_{i}^{T} σ_{ζ}^{2} + I_{i} σ_{ε}^{2}

(13)

which applies to all i = 1, 2, …, n.

The conditional density of y_ibased on M_iand V_iis

f (y_{i} | θ, z_{i}, w_{i}) = \frac{1}{\sqrt{2 π} {| V_{i} |}^{1 / 2}} \exp [- \frac{1}{2} {(y_{i} - M_{i})}^{T} V_{i}^{- 1} (y_{i} - M_{i})]

(14)

where θ = [μ a d $σ_{p}^{2}$ $σ_{ε}^{2}$ ]. Corresponding log-likelihood function defined is

L (θ) = \sum_{i = 1}^{n} \ln {E [f (y_{i} | θ, z_{i}, w_{i})]}

(15)

With derivative for μ, a and d, we can obtain

[\begin{matrix} \sum_{i - 1}^{n} φ_{i}^{T} V_{i}^{- 1} φ_{i} & \sum_{i = 1}^{n} E (z_{i}) φ_{i}^{T} V_{i}^{- 1} φ_{i} & \sum_{i = 1}^{n} E (w_{i}) φ_{i}^{T} V_{i}^{- 1} φ_{i} \\ \sum_{i = 1}^{n} E (z_{i}) φ_{i}^{T} V_{i}^{- 1} φ_{i} & \sum_{i = 1}^{n} E (z_{i}^{2}) φ_{i}^{T} V_{i}^{- 1} φ_{i} & \sum_{i = 1}^{n} E (z_{i} w_{i}) φ_{i}^{T} V_{i}^{- 1} φ_{i} \\ \sum_{i = 1}^{n} E (w_{i}) φ_{i}^{T} V_{i}^{- 1} φ_{i} & \sum_{i = 1}^{n} E (z_{i} w_{i}) φ_{i}^{T} V_{i}^{- 1} φ_{i} & \sum_{i = 1}^{n} E (w_{i}^{2}) φ_{i}^{T} V_{i}^{- 1} φ_{i} \end{matrix}] [\begin{matrix} μ \\ a \\ d \end{matrix}] = [\begin{matrix} \sum_{i = 1}^{n} φ_{i}^{T} V_{i}^{- 1} y_{i} \\ \sum_{i = 1}^{n} E (z_{i}) φ_{i}^{T} V_{i}^{- 1} y_{i} \\ \sum_{i = 1}^{n} E (w_{i}) φ_{i}^{T} V_{i}^{- 1} y_{i} \end{matrix}],

(16)

but the explicit equations for $σ_{ζ}^{2}$ and $σ_{ε}^{2}$ can not be derived in the same way. Instead of above likelihood function, we construct the following likelihood function by using joint conditional density of $y_{i}$ ,

L (θ) = \sum_{i = 1}^{n} \ln {E [f (y_{i} | θ_{1}, z_{i}, w_{i}) g (ζ_{i} | σ_{ζ}^{2})]}

(17)

Where θ₁ = [μ a d ζ_i $σ_{ε}^{2}$ ]

\begin{matrix} f (y_{i} | θ_{1}, z_{i}, w_{i}) = \frac{1}{\sqrt{2 π} σ_{ε}} \exp [- \frac{1}{2 σ_{ε}^{2}} {(y_{i} - M_{i} - φ_{i} ζ_{i})}^{T} (y_{i} - M_{i} - φ_{i} ζ_{i})] \\ g (ζ_{i} | σ_{ζ}^{2}) = \frac{1}{\sqrt{2 π} σ_{ζ}} \exp (- \frac{ζ_{i}^{2}}{2 σ_{ζ}^{2}}) \end{matrix}

With derivative for θ₁, we obtain

[\begin{matrix} \sum_{i = 1}^{n} m_{i} & \sum_{i = 1}^{n} m_{i} E (z_{i}) & \sum_{i = 1}^{n} m_{i} E (w_{i}) & m_{1} & \dots & m_{n} \\ \sum_{i = 1}^{n} m_{i} E (z_{i}) & \sum_{i = 1}^{n} m_{i} E (z_{i}^{2}) & \sum_{i = 1}^{n} m_{i} E (z_{i} w_{i}) & m_{1} E (z_{1}) & \dots & m_{n} E (z_{i}) \\ \sum_{i = 1}^{n} m_{i} E (w_{i}) & \sum_{i = 1}^{n} m_{i} E (z_{i} w_{i}) & \sum_{i = 1}^{n} m_{i} E (w_{i}^{2}) & m_{1} E (w_{1}) & \dots & m_{n} E (w_{i}) \\ m_{1} & m_{1} E (z_{1}) & m_{1} E (w_{1}) & m_{1} + \frac{σ_{ε}^{2}}{σ_{ζ}^{2}} & \dots & 0 \\ \dots & \dots & \dots & \dots & \dots & \dots \\ m_{n} & m_{n} E (z_{i}) & m_{n} E (w_{i}) & 0 & \dots & m_{n} + \frac{σ_{ε}^{2}}{σ_{ζ}^{2}} \end{matrix}] [\begin{matrix} μ \\ a \\ d \\ ζ_{1} \\ \dots \\ ζ_{n} \end{matrix}] = [\begin{matrix} \sum_{i = 1}^{n} m_{i} {\bar{y}}_{i} \\ \sum_{i = 1}^{n} m_{i} E (z_{i}) {\bar{y}}_{i} \\ \sum_{i = 1}^{n} m_{i} E (w_{i}) {\bar{y}}_{i} \\ m_{1} {\bar{y}}_{1} \\ \dots \\ m_{n} {\bar{y}}_{n} \end{matrix}]

(18)

and

σ_{ε}^{2} = \frac{1}{\sum_{i = 1}^{n} m_{i}} \sum_{i = 1}^{n} E [{(y_{i} - φ_{i} μ - z_{i} φ_{i} a - w_{i} φ_{i} d - φ_{i} ζ_{i})}^{T} ({\tilde{y}}_{i} - φ_{i} μ - z_{i} φ_{i} a - w_{i} φ_{i} d - φ_{i} ζ_{i})]

(19)

Where

\begin{array}{l} E [{(y_{i} - φ_{i} μ - z_{i} φ_{i} a - w_{i} φ_{i} d - φ_{i} ζ_{i})}^{T} (y_{i} - φ_{i} μ - z_{i} φ_{i} a - w_{i} φ_{i} d - φ_{i} ζ_{i})] \\ = {(y_{i} - φ_{i} μ)}^{T} {(y_{i} - φ_{i} μ)}^{2} + m_{i} a^{2} E (z_{i}^{2}) + m_{i} d^{2} E (w_{i}^{2}) - 2 {(y_{i} - φ_{i} μ)}^{T} [φ_{i} a E (z_{i}) + φ_{i} d E (w_{i})] + 2 m_{i} a d E (z_{i} w_{i}) + m_{i} E (ζ_{i}^{2}) \end{array}

(20)

so, we can simply utilize existing mixed model EM algorithm to find the MLE of parameters [9]. Followings are the EM steps for the mixed model analysis.

Step 0: Initialize all parameters with values in their legal domain, denoted by θ⁽⁰⁾.

Step 1: Compute the posterior probabilities of the three genotypes for each individual

\begin{matrix} p_{i k}^{*} = \frac{p_{i k} f (y_{i} | θ_{1}, z_{i}, w_{i})}{\sum_{k = 1}^{3} p_{i k} f (y_{i} | θ_{1}, z_{i}, w_{i})} & for k = 1, 2, 3 \end{matrix}

(21)

Step 2: Compute all the expectations involved in the following maximization steps (same with the equation (8)).

Step 3: Find the posterior distribution of the random effect p_ifrom equation (18). This posterior distribution turns out to be a mixture of three normal distributions with a mean

E (ζ_{i}) = σ_{p}^{2} φ V^{- 1} [y_{i} - φ μ - (p_{i 1}^{*} - p_{i 3}^{*}) φ a - (p_{i 2}^{*} - p_{i 1}^{*} - p_{i 3}^{*}) φ d]

(22)

and a variance

V a r (ζ_{i}) = σ_{ζ}^{2} - φ_{i} V^{- 1} φ_{i}^{T} σ_{ζ}^{4}

(23)

Step 4: Update the population mean, additive effect and dominance effect by equation (16). The resulting equations are equivalent to equations (9) replacing m_iwith $φ_{}^{T} V_{i}^{- 1} φ_{}$ .

Step 5: Update the covariance matrix of the random effect

σ_{ζ}^{2 (1)} = \frac{1}{n} \sum_{i = 1}^{n} E (ζ_{i}^{2}) = \frac{1}{n} \sum_{i = 1}^{n} [V a r (ζ_{i}) + E^{2} (ζ_{i})]

(24)

Step 6: Update the residual variance by equation (19)

\begin{matrix} σ^{2} = {[\sum_{i = 1}^{n} m_{i}]}^{- 1} {\sum_{i = 1}^{n} m_{i} [p_{i 1}^{*} {(y_{i} - μ - a + d - φ_{i} ζ_{i})}^{T} (y_{i} - μ - a + d - φ_{i} ζ_{i}) + \\ + p_{i 2}^{*} {({\bar{y}}_{i} - μ - d - φ_{i} ζ_{i})}^{T} ({\bar{y}}_{i} - μ - d - φ_{i} ζ_{i}) \\ + p_{i 3}^{*} {({\bar{y}}_{i} - μ + a + d - φ_{i} ζ_{i})}^{T} ({\bar{y}}_{i} - μ + a + d - φ_{i} ζ_{i})]} \end{matrix}

Step7: Repeat from step 1 to step 6 until a certain convergence criterion is reached.

MLE of parameters in both model (2) and (10) are iteratively solved at specific location on chromosomes using EM algorithm and the QTL position and effects are determined by means of likelihood ratio statistics in chromosome or genome scanning.

Simulation studies

A series of simulation experiments were used to compare the efficiency and behaviour of two mapping methods based on the repeatability model with simple analysis using the mean phenotype for a trait with repeat records. We simulated a single chromosome of 100 cM long with 11 evenly spaced codominant markers for an F2 population with sample size n = 100 and a single QTL was put at position 25 cM (between markers 3 and 4). Under the null model, the QTL was assigned a value of zero for both the additive and dominance effects. The empirical critical values of likelihood ratio statistics for testing the presence of the QTL were obtained by simulating 1000 replicates. Under the alternative model, nonzero and equal additive and dominance effects were simulated. The simulations were replicated 100 times. Empirical power was calculated by counting the number of runs in which test statistics were greater than the critical values.

Factor considered include the QTL size, measured as the proportion of the phenotypic variance explained by the QTL (also called the QTL heritability), the number of replicates and $σ_{ζ}^{2}$ : $σ_{ε}^{2}$ i.e the variance ratio of permanent environmental effect to random environmental effect. The QTL size was set at three levels: a = d = 0.265, 0.577, 0.943 correspond to the three levels of h² = 0.05, 0.10, 0.20 respectively. The number of replicates was examined at five levels: m = 1, 3, 5, 10, 15, and $σ_{ζ}^{2}$ : $σ_{ε}^{2}$ = 1:4, 2:3, 2.5:2.5, 3:2, 4:1, remaining $σ_{ζ}^{2}$ + $σ_{ε}^{2}$ = 5.0.

The j th phenotypic value of individual i was simulated by using the repeatability model:

y_ij= μ + z_ia + w_id + ξ_iσ_ζ+ η_ijσ_ε (25)

Where both ξ_iand η _ijare the random numbers from standard normal distribution.

The results of all simulations consistently show that under the same experimental condition, (1) using the repeatability model can significantly increase the statistical power of QTL detecting compared with simple analysis using the mean phenotype, (2) the position and effects of QTL, especially the proportion of phenotypic variance contributed by QTL were more accuracy estimated by using the repeatability model than using the genetic mapping model without permanent environmental effects to analyze mean phenotype. The superiority of the repeatability model over the simple analysis using the mean phenotype performs in evidence under the condition of the low QTL heritability.

The effects of number of replications on the efficiency and behaviour of the two methods were investigated only at variance ratio of permanent environmental effect to random environmental effect of 1:1. The results of simulations were listed in Table 1 and 2, respectively, by different mapping method. Notices that the simulated results at m = 1 (no replication) only correspond to the mapping method based on the mean phenotype for no solution by using the repeatability model. As expected, the statistical power of QTL detecting with replication is higher than no replication, based on either the mean phenotype or the repeatability model. The estimation of QTL parameters show a general tendency to improve as the number of replications increases.

Table 1 Effects of the number of replications on the mapping analysis based on the repeatability model

Full size table

Table 2 Effects of the number of replications on the simple analysis using the mean phenotype

Full size table

We have also investigated the impact of the variance ratio of permanent environmental effect to random environmental effect on differences in mapping performance between the two methods. The results of simulations fixing five replications were listed in Table 3. The difference in variance between permanent environmental effect and random environmental effect is greater under fixing total variance of random effects, the superiority of the mapping method based on the repeatability model over the mean phenotype is clearer in the statistical power of QTL detecting. The possible reasons are that either the large variance of random environmental effect made reliability of the individual's mean phenotype value low or the variance of residual error in model (2) increases with the variance of permanent environmental effect increased.

Table 3 Comparisons of the mapping analysis based on the repeatability model with the simple analysis using the mean phenotype under the conditions of different the variance ratios of permanent environmental effects to random environmental effects

Full size table

Discussion

For a trait with repeat records, we proposed use of the repeatability model to map QTL, which distinguishes from simple analysis using the mean phenotype not only in the data analyzed but essentially in the model adopted. Simple analysis using the mean phenotype was based on regular genetic model for mapping QTL in linecross, which excluded the permanent environmental effects. The excluded permanent environmental effects were deposited to the residual error, decreasing the accuracy of estimation for QTL parameters, which was strictly proved in the relevant books to statistic models [e.g., [10, 11]]. Of course, the loss of data information has also influenced the performance of mapping QTL based on the mean phenotype.

Replication required either the experimental conditions must be the same when multiple records were observed only from one individual or the genetic backgrounds must be the identical for each individual while those records were from multiple individuals. If the former was not satisfied, then such "repeat" records observed became longitudinal data, such as test-day records of milk production and body weight in cattle, were genetically analysed using the random regresion model which is essentially the repeatability model nested submodels of time [12–14]. Besides cloned individuals and progencies from each plant in RIL, the later was hard to be satisfied. For example, there were incompletely same genetic backgrounds among individuals within a family and F3 progenies from one F2 individual. To improve the efficiency of detecting QTL using such data, the genetic backgrounds should be at least taken into account in the analysis [7], furthermore, the repeatability model may be a good choice for directly analyzing such "repeat" records.

Although we demonstrate the statistical method of QTL mapping using a F₂ population as an example, other more simple or complex designs, such as backcross population and full-sib family can also be extended. Assuming only one QTL in the model considered here is to conveniently investigate efficiency of presented method based on various estimates. If a trait is controlled by multiple loci, the composite interval mapping [15, 16] or Bayesian mapping [e.g., [17, 18]] will be proposed for mapping those QTLs by incorporating marker-cofactors outside the scanning interval or all the QTLs into the model (9).

References

Fisher RA: The design of experiments. 1971, New York, Hafner Publishing Company, 9
Google Scholar
Steel RGD, Torrie JH: Principles and procedures of statistics: a biometrical approach. 1980, Tokyo, McGraw-Hill Kogakusha, 2
Google Scholar
Henderson CR: Applications of Linear Models in Animal Breeding. 1984, Guelph ON Univ of Guelph
Google Scholar
Mrode RA: Linear Models for the Prediction of Animal Breeding Values. 1996, UK, CAB International
Google Scholar
Falconer DS: Introduction to Quantitative Genetics. 1960, London,Oliver & Boyd
Google Scholar
Zhang TY, Yuan J, Yu W, Guo Z, Kohel RJ: Molecular tagging of a major QTL for fiber strong in upland cotton and its marker-assisted selection. Theor Appl Genet. 2003, 106: 262-268.
CAS PubMed Google Scholar
Zhang Y, Xu S: Mapping Quantitative Trait Loci in F2 Incorporating Phenotypes of F3 Progeny. Genetics. 2004, 166: 1981-1993. 10.1534/genetics.166.4.1981.
Article PubMed Central CAS PubMed Google Scholar
Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via EM algorithm. J R Stat Soc Ser B. 1977, 39: 1-38.
Google Scholar
Henderson CR: Recent developments in variance and covariance estimation. J Anim Sci. 1986, 63: 208-216.
Google Scholar
Zar JH: Biostatistical Analysis. 1996, Prentice Hall, 3
Google Scholar
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W: Applied Linear Statistical Models. 1996, RD Irwin, Homewood, IL, 4
Google Scholar
Henderson CR: Analysis of covariance in the mixed model: Higher level, no homogenous, and random regressions. Biometrics. 1982, 38: 623-640. 10.2307/2530044.
Article PubMed Google Scholar
Schaeffer LR: Application of random regression model in animal breeding. Livest Prod Sci. 2004, 86: 35-45. 10.1016/S0301-6226(03)00151-9.
Article Google Scholar
Macgregor S, Knott SA, White I, Visscher PM: Quantitative trait locus analysis of longitudinal quantitative trait data in complex pedigrees. Genetics. 2005, 171: 1365-1376. 10.1534/genetics.105.043828.
Article PubMed Central CAS PubMed Google Scholar
Jansen RC: Controlling the type I and type II errors in mapping quantitative trait loci. Genetics. 1994, 138: 871-881.
PubMed Central CAS PubMed Google Scholar
Zeng ZB: Precision mapping of quantitative trait loci. Genetics. 1994, 136: 1457-1468.
PubMed Central CAS PubMed Google Scholar
Satagopan JM, Yandell BS, Newton MA, Osborn TC: A Bayesian approach to detect quantitative trait loci using Markov chain Monte Carlo. Genetics. 1996, 144: 805-816.
PubMed Central CAS PubMed Google Scholar
Yi N, Xu S: Bayesian mapping of quantitative trait loci under complicated mating designs. Genetics. 2001, 157: 1759-1771.
PubMed Central CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

School of Agriculture and Biology, Shanghai Jiaotong University, Shanghai, 201101, P.R. China
Runqing Yang
Life Science College, Heilongjiang August First Land Reclamation University, Daqing, 163319, P.R. China
Ming Fang

Authors

Runqing Yang
View author publications
You can also search for this author in PubMed Google Scholar
Ming Fang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Runqing Yang.

Additional information

Authors' contributions

RQY coordinated the study, developed the foundational principle of the method and wrote the computing program and the paper. FM was responsible for the simulation experiment and carried out the analysis of results.

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Yang, R., Fang, M. Mapping quantitative trait loci in line cross with repeat records. BMC Genet 8, 47 (2007). https://doi.org/10.1186/1471-2156-8-47

Download citation

Received: 18 August 2006
Accepted: 12 July 2007
Published: 12 July 2007
DOI: https://doi.org/10.1186/1471-2156-8-47

Mapping quantitative trait loci in line cross with repeat records

Abstract

Background

Results

Conclusion

Background

Theory and methods

Mapping QTL based on the mean phenotype

Mapping QTL based on the repeatability model

Simulation studies

Discussion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Rights and permissions

About this article

Cite this article

Keywords

BMC Genomic Data

Contact us

Mapping quantitative trait loci in line cross with repeat records

Abstract

Background

Results

Conclusion

Background

Theory and methods

Mapping QTL based on the mean phenotype

Mapping QTL based on the repeatability model

Simulation studies

Discussion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Authors' contributions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomic Data

Contact us