Mapping quantitative trait loci in line cross with repeat records

Background Phenotypes with repeat records from one individual or multiple individuals were often encountered in practices of mapping QTL in linecross. The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL. Results We propose to map QTL by using the repeatability model to directly analyze the repeat records rather than simply analyze the mean phenotype, improving the efficiency of QTL detecting because of adequately utilizing the information from data and allowing for the permanent environmental effects. A maximum likelihood method implemented via the expectation-maximization (EM) algorithm is applied to perform the parameter estimation of the repeatability model. The superiority of the mapping method based on the repeatability model over simple analysis using the mean phenotype was demonstrated by a series of simulations. Conclusion Our results suggest that the proposed method can serve as a powerful alternative to existing methods. By mean of the repeatability model, utilizing the repeat records on individual may improve the efficiency of QTL detecting in line cross.


Background
Replication is the fundamental of the experimental design, the important advantages of which are that it allows for an estimate of experimental error and increases the reliability of information obtained at each experimental point [1,2]. Replication denotes sampling or measuring multiple times under the same experimental condition (within one treatment), where the experimental unit may be either one individual or multiple individuals with the identical genetic background.
Often plants or animals are observed more than once for a particular trait. For examples, fleece weight of sheep in different years, blood pressure and pulse of a human over time, litter size of sows over time, antler size of deer in different seasons, racing results of horses from several races, exam scores of students during university and so on. These records observed belong to replicate ones if they are not influenced by the measuring environments, such as the years, seasons, parities, races.
In classical quantitative genetics, a trait with repeat records is generally analysed by means of the repeatability model [3,4], in which, there is an additional permanent environmental effect besides an individual's additive genetic value for a trait. The permanent environmental effect as a measure of the differences among experimental units, is a non-genetic effect common to all observations on the same individual [5]. Such environmental effects are usually accounted for in the model to ensure accurate prediction of breeding values [4]. However, the repeatability model has not been paid adequate attention to mapping QTL by using data with repeat records.
The current genetic mapping method for a trait with repeat records is adopted by simply replacing the phenotype by the average value of the repeat records [6,7]. This simple treatment has not sufficiently utilized the information from the replication and ignored the impacts of the permanent environmental effects on the accuracy of the estimated QTL, although it enables to improve the power of detecting QTL with a certain extent.
In this study, we apply the repeatability model to mapping quantitative trait loci with repeat records and demonstrate the higher efficiency of this model by the simulations.

Theory and methods
Mapping QTL based on the mean phenotype Take a simple F 2 population of size n derived from two homozygous lines as an example. There are the three possible genotypes denoted by Q 1 Q 1 , Q 1 Q 2 , and Q 2 Q 2 , respectively, at a quantitative trait locus Q. The phenotypic value of an individual i is usually described by the following linear model, Because is a function of the unknown parameters, iterations are required for EM algorithm. The iterations are described as Step 0: Set up initials for θ (0) .

Mapping QTL based on the repeatability model
Partitioning residual error e i in model (1) into an individual-specific permanent environmental effect ζ i and random environmental effect ε ij , the jth phenotypic value of an individual i is represented as This is a mixed effects model, also called repeatability model, with a and d being treated as the fixed effects and We use an m i × 1 vector y i = [y i1 y i2 … y im ] T , for n = 1, 2, …, n to denote the array of phenotypic values for the ith individual and define ϕ i = [1 1 … 1] T as a vector of dimension m i . In matrix notation, model (9) can be written as where ε i = [ε i1 ε i2 … ε im ] T is an m i × 1 vector for the random environmental effects which follows N(0, I i , ) with I i being an (m i × 1) × (m i × 1) identity matrix. The conditional expectation of model (11) given the fixed effects is and the variance-covariance matrix is which applies to all i = 1, 2, …, n.
and Where so, we can simply utilize existing mixed model EM algorithm to find the MLE of parameters [9]. Followings are the EM steps for the mixed model analysis.
Step 0: Initialize all parameters with values in their legal domain, denoted by θ (0) .
Step 1: Compute the posterior probabilities of the three genotypes for each individual Step 2: Compute all the expectations involved in the following maximization steps (same with the equation (8)).
Step 3: Find the posterior distribution of the random effect p i from equation (18). This posterior distribution turns out to be a mixture of three normal distributions with a mean and a variance Step 4: Update the population mean, additive effect and dominance effect by equation (16). The resulting equa-tions are equivalent to equations (9) replacing m i with .
Step 5: Update the covariance matrix of the random effect Step 6: Update the residual variance by equation (19) Step7: Repeat from step 1 to step 6 until a certain convergence criterion is reached. (2) and (10) are iteratively solved at specific location on chromosomes using EM algorithm and the QTL position and effects are determined by means of likelihood ratio statistics in chromosome or genome scanning.

Simulation studies
A series of simulation experiments were used to compare the efficiency and behaviour of two mapping methods based on the repeatability model with simple analysis using the mean phenotype for a trait with repeat records. We simulated a single chromosome of 100 cM long with 11 evenly spaced codominant markers for an F2 population with sample size n = 100 and a single QTL was put at position 25 cM (between markers 3 and 4    The jth phenotypic value of individual i was simulated by using the repeatability model: Where both ξ i and η ij are the random numbers from standard normal distribution. The results of all simulations consistently show that under the same experimental condition, (1) using the repeatability model can significantly increase the statistical power of QTL detecting compared with simple analysis using the mean phenotype, (2) the position and effects of QTL, especially the proportion of phenotypic variance contributed by QTL were more accuracy estimated by using the  σ p 2 σ ε 2 repeatability model than using the genetic mapping model without permanent environmental effects to analyze mean phenotype. The superiority of the repeatability model over the simple analysis using the mean phenotype performs in evidence under the condition of the low QTL heritability.
The effects of number of replications on the efficiency and behaviour of the two methods were investigated only at variance ratio of permanent environmental effect to random environmental effect of 1:1. The results of simulations were listed in Table 1 and 2, respectively, by different mapping method. Notices that the simulated results at m = 1 (no replication) only correspond to the mapping method based on the mean phenotype for no solution by using the repeatability model. As expected, the statistical power of QTL detecting with replication is higher than no replication, based on either the mean phenotype or the repeatability model. The estimation of QTL parameters show a general tendency to improve as the number of replications increases.
We have also investigated the impact of the variance ratio of permanent environmental effect to random environmental effect on differences in mapping performance between the two methods. The results of simulations fixing five replications were listed in Table 3. The difference in variance between permanent environmental effect and random environmental effect is greater under fixing total variance of random effects, the superiority of the mapping method based on the repeatability model over the mean phenotype is clearer in the statistical power of QTL detect- ing. The possible reasons are that either the large variance of random environmental effect made reliability of the individual's mean phenotype value low or the variance of residual error in model (2) increases with the variance of permanent environmental effect increased.

Discussion
For a trait with repeat records, we proposed use of the repeatability model to map QTL, which distinguishes from simple analysis using the mean phenotype not only in the data analyzed but essentially in the model adopted.
Simple analysis using the mean phenotype was based on regular genetic model for mapping QTL in linecross, which excluded the permanent environmental effects. The excluded permanent environmental effects were deposited to the residual error, decreasing the accuracy of estimation for QTL parameters, which was strictly proved in the relevant books to statistic models [e.g., [10,11]]. Of course, the loss of data information has also influenced the performance of mapping QTL based on the mean phenotype.
Replication required either the experimental conditions must be the same when multiple records were observed only from one individual or the genetic backgrounds must be the identical for each individual while those records were from multiple individuals. If the former was not satisfied, then such "repeat" records observed became longitudinal data, such as test-day records of milk production and body weight in cattle, were genetically analysed using the random regresion model which is essentially the repeatability model nested submodels of time [12][13][14]. Besides cloned individuals and progencies from each plant in RIL, the later was hard to be satisfied. For example, there were incompletely same genetic backgrounds among individuals within a family and F3 progenies from one F2 individual. To improve the efficiency of detecting QTL using such data, the genetic backgrounds should be at least taken into account in the analysis [7], furthermore, the repeatability model may be a good choice for directly analyzing such "repeat" records.
Although we demonstrate the statistical method of QTL mapping using a F 2 population as an example, other more simple or complex designs, such as backcross population and full-sib family can also be extended. Assuming only one QTL in the model considered here is to conveniently investigate efficiency of presented method based on various estimates. If a trait is controlled by multiple loci, the composite interval mapping [15,16] or Bayesian mapping [e.g., [17,18]] will be proposed for mapping those QTLs by incorporating marker-cofactors outside the scanning interval or all the QTLs into the model (9).