Simulations of an Ornstein-Uhlenbeck process
The asymptotic distribution of the LRT process at marker positions was shown as being the square of an OU process [16
]. Let's X
denote the value of this OU process at the t
could be described as:
denotes the Brownien movement.
In a backcross type population, the mean of this process is 0 and the autocovariance is: cov(X
t') = e
-2|t-t'| with t and t' in the Haldane distance unit .
To simulate this process, we considered a linkage group with mk
markers. We generated mk
independent random numbers z0
, ..., z
from a normal distribution with mean 0 and variance 1 with the function rnorm
in R. We defined X
. Then, a discrete analog of the OU process [23
] was given by:
with s = 1, ..., mk, where τ denotes the spacing of two adjacent markers in Morgan. This sequence is a first-order autoregressive sequence.
Simulations of outbred type population
The QTLMap software [21
] was used to simulate and analyse the data sets. QTLMap allowed the simulation of complete experimental designs with pedigree, genetic map, genotypes and phenotypes http://www.inra.fr/qtlmap
. The population structure was a mixture of full and half sib families for given numbers of sires (s
), of dams per sire (d
) and of progeny per dam (p
). Most often, 3 markers were equally distributed on a 0.4 M linkage group. Each marker had 6 alleles with equal frequencies in the parental population. The QTL was simulated at 0.1 M and all sires and dams were heterozygous for the QTL. The phenotypes of the progeny were simulated as follows:
is the phenotype of the progeny ijk of the sire i and of the dam ij. u
denote the polygenic effects, of the sire i and of the dam ij respectively, which follow a normal distribution with mean 0 and variance
. a denotes the QTL allelic substitution effect and g
0) is the genotypic value of ijk at the QTL location t
takes value 1, 0 or -1 depending onthe QTL genotype, QQ, Qq or qq, respectively. e
is a random normal variable with mean 0 and variance
. The variance within QTL genotype is
and a is expressed in σ unit. The heritability coefficient, equal to
, was fixed at 0.25.
For each of the cases studied, the results were based on 5 000 simulations, either under the null hypothesis (H
0: there is no QTL segregating on the linkage group, i.e. a = 0) or under the alternative hypothesis (H
0: there is one QTL segregating on the linkage group, i.e. most often a = 1σ). For each simulated dataset, the estimated QTL position was the location of the linkage group where the LRT was maximum.
Under the null hypothesis, simulations were carried out so as to compare the influence of the population size with 6 levels: 60 (3s,1d,20p), 80 (4s,1d,20p), 100 (5s,1d,20p), 300 (5s,2d,30p), 400 (5s,2d,40p), 800 (5s,4d,40p) progeny. Under the H1 hypothesis, only 3 of these population sizes were considered: 100, 300 and 800 progeny.
To understand how the QTL effect affects the estimation of the QTL location, a population of 100 progeny was simulated with a QTL effect ranging from 0.5 σ to 4 σ.
Other simulations were performed in a population of 300 progeny. Firstly, to check the bias extent depending on the marker density, samples with 2, 3, 5, 7 or 11 markers equidistant in a linkage group of 0.6 M were simulated, under the null and under the alternative hypotheses. Under H
1, one QTL was simulated at 0.25 M (a = 1σ). Secondly, to test how the true QTL location may affect the bias, we performed simulations under H
1 with a QTL (a = 1σ) lying at 0 M, 0.05 M, 0.10 M, 0.15 M, 0.2 M on a linkage group of 0.4 M with two flanking markers at 0 M and 0.4 M.
Appropriate statistical tests are needed to evaluate which parameters affect the bias of the estimated QTL position. ANOVA was not adequate to test the equality of the average QTL position in two different conditions (e.g. 2 population sizes) because of the non normality of the QTL position estimator. Therefore, two nonparametric tests were combined in order to test which parameters affect the bias, and how they influence the variation of the QTL location estimation. This was performed in two steps: (1) the parameters which influence the accuracy of the estimated QTL location were identified. This step was carried out with a Kolmogorov-Smirnov test; (2) for the parameters identified in the first step, a description of their effect on the accuracy of the estimated QTL position was made. This step was performed with a Mann-Whitney-Wilcoxon test .
1. Kolmogorov-Smirnov test (KS): this test was applied in order to check whether a parameter affected the estimation of the QTL position. For each value of the parameter, an empirical distribution of the estimated QTL location was obtained using 5 000 simulations. The two hypotheses compared by the KS test were:
denote the distribution of the estimated QTL position under the conditions a and b, respectively. For a given parameter, all the distributions were compared by pairs with the function ks.test in R. If all pair comparisons concluded to accept the null hypothesis, it means that the value of this parameter did not influence the estimation of the QTL position.
Mann-Whitney-Wilcoxon test (MWW): when the null hypothesis in the first step was rejected, the MWW test was used to understand how the parameter affected the estimation with the function wilcox.test
in R. The hypotheses compared were:
denote the absolute values of the deviations between the estimated QTL position and the assumed, i.e. the true position, under the condition a and b, respectively. A smaller median D corresponds to a more accurate position estimation.