Statistical properties of interval mapping methods on quantitative trait loci location: impact on QTL/eQTL analyses

Background Quantitative trait loci (QTL) detection on a huge amount of phenotypes, like eQTL detection on transcriptomic data, can be dramatically impaired by the statistical properties of interval mapping methods. One of these major outcomes is the high number of QTL detected at marker locations. The present study aims at identifying and specifying the sources of this bias, in particular in the case of analysis of data issued from outbred populations. Analytical developments were carried out in a backcross situation in order to specify the bias and to propose an algorithm to control it. The outbred population context was studied through simulated data sets in a wide range of situations. The likelihood ratio test was firstly analyzed under the "one QTL" hypothesis in a backcross population. Designs of sib families were then simulated and analyzed using the QTL Map software. On the basis of the theoretical results in backcross, parameters such as the population size, the density of the genetic map, the QTL effect and the true location of the QTL, were taken into account under the "no QTL" and the "one QTL" hypotheses. A combination of two non parametric tests - the Kolmogorov-Smirnov test and the Mann-Whitney-Wilcoxon test - was used in order to identify the parameters that affected the bias and to specify how much they influenced the estimation of QTL location. Results A theoretical expression of the bias of the estimated QTL location was obtained for a backcross type population. We demonstrated a common source of bias under the "no QTL" and the "one QTL" hypotheses and qualified the possible influence of several parameters. Simulation studies confirmed that the bias exists in outbred populations under both the hypotheses of "no QTL" and "one QTL" on a linkage group. The QTL location was systematically closer to marker locations than expected, particularly in the case of low QTL effect, small population size or low density of markers, i.e. designs with low power. Practical recommendations for experimental designs for QTL detection in outbred populations are given on the basis of this bias quantification. Furthermore, an original algorithm is proposed to adjust the location of a QTL, obtained with interval mapping, which co located with a marker. Conclusions Therefore, one should be attentive when one QTL is mapped at the location of one marker, especially under low power conditions.


Background
For the last decade, several studies have shown that a large proportion of QTL are mapped at the markers locations whenever linkage analysis is applied. As to what regards dataset analyses, this bias first raised doubts in Spelman et al. [1], who observed a large proportion of significant test statistics at marker location when looking for QTL in five milk production traits. Walling et al. [2] have described the influence of markers in constructing the confidence intervals of QTL location and questioned whether QTL location was biased towards the location of markers instead of its true position. By applying the regression coefficients on the markers as suggested by Whittaker et al. [3], Walling et al. [4] calculated the proportion of putative QTL located at marker positions in a backcross population. They have reported a systematic bias for the estimated QTL position under the null hypothesis of the test, i.e. the hypothesis of no QTL on the linkage group. Moreover, results from linear regression methods for QTL detection have been reported to behave the same way the results from maximum likelihood methods in interval mapping approaches do [5]. The simulation studies by Walling et al. [4] have confirmed that these two approaches have similar biases on the estimated QTL position in a backcross population.
These previous works have shown that a bias on the QTL location occurs when genetic linkage analysis for QTL mapping is used in a backcross population. However, little research has been devoted to establishing which parameters give rise to that bias. What is more, no study has investigated how it affects linkage analysis applied to outbred populations. In order to address these shortcomings, the present study aims at identifying the sources of that bias, in particular regarding the analysis of data issued from outbred populations.
This question is of critical importance in expression quantitative trait loci (eQTL) mapping. Indeed, a main objective is often to search for eQTL which co localize with QTL which influences agronomical performances. The accuracy of the eQTL locations is thus a fundamental element for experimental design optimization, especially since experimental designs for gene expression analyses are generally of moderate size due to the cost of phenotyping. The pioneer work of eQTL detection can be traced back to the emergence of the concept of genetical genomics [6]. During the past decade, QTL mapping was widely applied to the detection of eQTL, for example in yeast [7], mice [8], human [9,10], maize [11] and pig [12]. Generally, mapping procedures were used to map eQTL considering each transcript expression level as one quantitative trait in a trait by trait analysis.
Recently, we have carried out linkage analyses by interval mapping on high throughput transcriptomic data from several familial QTL detection designs, in pig [13], in poultry [14] and in trout [15]. Because of the high dimensionality of the phenotypes, these eQTL analyses have highlighted to the bias of interval mapping estimation of the QTL location. We observed that the number of eQTL detected at marker locations was consistently higher than between marker locations: for instance, with the analysis of 6 665 gene expressions in a population of 325 pigs, we found 756 eQTL on the chromosome 18 distributed as shown in Figure 1. It appeared that the eQTL were significantly more often mapped on marker locations rather than between marker locations.
Hence, in order to qualify this possible bias on the estimated QTL location in outbred populations and to specify which parameters influence it, this paper presents a study of the QTL location accuracy. Firstly, in order to make things more concrete, we explored the empirical distribution of the LRT along the linkage group under the null hypothesis of "no QTL" on a real dataset. Secondly, analytical developments were carried out so as to identify the parameters which influence the QTL location accuracy. Since they are impossible to realize for outbred populations, because of the test statistic complexity, a more simple case of a backcross type population, i.e. a backcross between inbred lines, was considered at that stage. Thirdly, designs of outbred sib families were simulated and analyzed in order to characterize the bias variability under the null and the alternative ("one QTL segregating on the linkage group") hypotheses. Such parameters as population size and marker density under the null hypothesis, as well as QTL effect and simulated QTL location under the alternative hypothesis were taken into account. Finally, an approach as to how to adjust the QTL location estimation, for a QTL located at the position of one marker, was suggested.

Results
The LRT distribution along the linkage group under the null hypothesis By using the real pedigree and genotypes structure from an experimental design in pig, 2 000 simulations of phenotypes under the null hypothesis of "no QTL on the linkage group" were performed. The distribution of the estimated QTL location, i.e. the location of the maximum LRT, was obtained ( Figure 2) on the chromosome SSC1 which carried 16 microsatellite markers (black points in the X axis). It clearly shows that a large proportion of QTL was found at a marker location. The histograms in Figure 3 show the empirical distributions of the 2000 LRT at some locations on the chromosome, both markers and non-markers. The LRT at each location followed a x 2 distribution, with degrees of freedom ranging from 4.05 to 4.55, according to a Kolmogorov-Smirnov (KS) test at a = 0.01. This was generalized to all tested positions on the linkage group in Figure 4, where the distributions of the LRT characterized by the number of degrees of freedom under the  It should be noted ( Figure 2) that the proportion of QTL locations estimated at the two extreme marker positions of the linkage group was higher than for the other markers under the "no QTL" hypothesis. The asymptotic distribution of the LRT process at marker positions is known to be the square of an Ornstein-Uhlenbeck (OU) process as pointed out by Lander and Botstein [16] and proved by Cierco [17]. As indicated in Rabier et al. [18], when test statistics are performed only on markers, the OU process follows an autoregressive process of order 1. After 1 million of simulations for this process, we found that the probability for the maximum of the OU process to be on the bounds is higher than within the interval ( Figure 5). This property of the OU process was consistent with the fact that we observed a large proportion of QTL localised at the extreme marker positions in comparison with the other markers.

The QTL location bias expression in a backcross population
In order to investigate the bias on the QTL location under the hypothesis of "one QTL", we considered a linkage group limited to an interval [0,T] between two markers M 1 (alleles M 1 and m 1 ) at 0 and M 2 (alleles M 2 and m 2 ) at T flanking a QTL (alleles Q and q) in a backcross population obtained from the cross Let y k be the phenotypic value for the individual k = 1,...,n. Assuming a QTL located at the location t 0 , the genetic model for y k is: where μ denotes the overall mean, a denotes the QTL allelic substitution effect, g k (t 0 ) is the genotypic value of k at the QTL position, which takes 1 or -1 value depending on the QTL allele, Q or q respectively, received by k from its heterozygous parent and e k is a random normal variable with mean 0 and variance σ 2 . In order to simplify calculations, we set μ = 0 and σ 2 = 1 and used a linearized likelihood function instead of a mixture of two normal distributions (e.g. [19]). In this case, the interval mapping method is the same as the regression method for QTL detection, and the model (1) is modified as follows:  where x k (t 0 ) = E[g k (t 0 )|M k ] and M k denotes the genotype of the individual k at markers M 1 and M 2 . Let θ t-t' be the recombination rate in the distance |tt'|, then for all k (see Appendix I): and the LRT at the position t is calculated as . ( Note that given a position t, {x k (t)} k = 1, ..., n is a sequence of i.i.d. random variables with the same expected value 0 and the same variance (see Appendix III), noted Var(x(t)). Hence, if we replace y k by the model (2) in equation (3), then the LRT(t) consists of three terms as (see Appendix II): is a function of variable t, the noise ε 1 (t) is the LRT at t under the no QTL hypothesis and the noise ε 2 (t) follows approximately a normal distribution with mean 0 and variance na 2 Var(x(t))ρ 2 tt 0 . Note that: arg max i.e. the bias on LRT(t) behaves similarly to the bias on LRT(t)/n. So this property allows to analyze the source of the bias in LRT(t)/n instead of LRT(t). Using the law of large numbers, we have this following decomposition of LRT (t)/n: It can be seen from formula (4) that the first term reaches its maximum at t 0 since ρ tt 0 → 1 when t t 0 . As seen in the previous section, the second term, which is proportional to the LRT at t under the "no QTL" hypothesis, reaches its maximum at the position of markers more often than at the positions between markers. As a result, the estimated QTL location will be biased towards the position of markers. However, when n or a 2 increase, or when T decreases, or when t 0 approaches one marker location (see Appendix III), the deviation between the two first terms in formula (4) increases and the influence of the second term is reduced. Therefore, in our simple backcross population model, under the hypothesis of one QTL, when the population size, marker density, QTL effect increase or when the true QTL location approaches the position of a marker, the bias of the estimated QTL location is expected to be reduced.

Simulations under H0
According to the preceding results, the estimated QTL location cannot be expected to be uniformly distributed on the chromosome under the null hypothesis of no QTL. Familial designs were simulated to test the influence of the population size and the marker density on this bias in outbred populations.

Impact of the population size
Six different population sizes, from 60 to 800 progeny, were simulated with three markers located at 0 M, 0.2 M and 0.4 M on a 0.4 M linkage group. The empirical distributions of the estimated QTL location, obtained from 5 000 simulations for each population size, showed that the probability of mapping a QTL at a marker location was always higher than that of mapping it between markers ( Figure 6). As shown in Table 1, the proportion of QTL which co-localized with the markers looked independent from the population size. When comparing the estimated QTL location distributions, which were obtained with the different population sizes, using the Kolmogorov-Smirnov test, large p-values were obtained (> 0.97). This indicates that the population size did not influence the distribution of the estimated QTL location in a significative way under the null hypothesis.

Impact of the marker density
The empirical distributions of estimated QTL locations were compared for five marker, with 2, 3, 5, 7 and 11 markers equally distributed on a 0.6 M linkage group ( Figure 7). The bias towards the marker positions was systematic whichever the marker density. Except for 11 markers, the number of markers had little influence on the proportion of QTL located at a marker position (Table 2), i.e. the bias seemed to divide up between markers. However, this criteria was difficult to interpret because the number of markers was different in each of the cases studied. Moreover, in all cases, the two extreme markers concentrated more false locations than intermediate markers did. Since the number of markers was different, the distributions based on the marker density were expected to be different as well. The KS test was thus not suitable to test the influence of the marker density.

Simulations under H1
According to the analytical results obtained in a backcross type population, the population size and the marker density, as well as the QTL effect and location, were parameters which were very likely to influence the bias on the estimated QTL location under the alternative hypothesis.

Impact of the population size
Three sizes of population were simulated: 100, 300 or 800 progeny. The empirical distributions of the estimated QTL location are given in Figure 8. One QTL was simulated at the 0.1 M location (Δ on the X axis) on a 0.4 M linkage group with three markers located at 0 M, 0.2 M and 0.4 M. The Figure 8 clearly shows a variability of the bias of the QTL location depending on the population size. It can be seen from Table 3 that the proportion of putative QTL that co-localized with a marker and the root mean square error (RMSE) of the QTL location became smaller when the population size increased. Table 3 also shows that the bias was correlated to the power of the analysis. However, considering only very significant LRT (a = 0.01), significant (a = 0.05) or all LRT (a = -), led to very similar values of RMSE or of proportion of QTL mapped on a marker location. The KS test indicated that the distributions of the estimated QTL position were significantly different depending on the size of the population studied (pvalues = 0). Moreover, the Mann-Whitney-Wilcoxon (MWW) test showed that, when the population size increased, the median of the error of the estimated QTL position significantly decreased (p -values < 2.2e -16).  The population structure was a mixture of full and half sib families for given numbers of sires (s), dams per sire (d), and progeny per dam (p).  Impact of the marker density Figure 9 reports the QTL location distribution for a marker density from 2 to 11 markers equally distributed on a 0.6 M interval. One QTL was simulated at 0.25 M. It shows the advantage of using a high marker density in the QTL detection: when the marker spacing was minimal (i.e. 6 cM), the location of the QTL was very accurate. Table 4 shows that, when the density increased, the RMSE of the estimated QTL location decreased. So the QTL location was more accurately estimated. The dependency between the power of the analysis and the bias extent was confirmed. On the contrary, there were low variations of RMSE or proportion of QTL mapped at marker location according to α. Finally, as under H0, these tendencies were confirmed for the proportion of QTL located at a marker location, except for 11 markers.

Impact of the QTL effect
The empirical distributions of the estimated QTL location when the QTL effect increased from 0.5 phenotypic standard deviation (s) to 4s is shown in Figure 10. One QTL was simulated at 0.1 M on a 0.4 M linkage group with three markers equally spaced. The power of the QTL detection, the RMSE of the estimated QTL location and the proportion of estimated QTL locations at a marker are given in Table 5. Results indicated that, whenever the QTL effect increased, the bias decreased.
As seen previously, the bias decreased when the power increased but the RMSE or the proportion of QTL mapped on a marker position were only slightly dependent on the test level. The KS test indicated that the distribution of the estimated QTL position was significantly different when the QTL effect changed (p -values < 2.2e -16). The MWW test showed that, when the QTL effect increased, the median of the error of the estimated QTL position decreased (p -values < 2.2e -16).
Impact of the true QTL location Figure 11 shows the variation in the distribution of the estimated QTL location when the simulated QTL position (Δ on the X axis) moved towards the middle of two flanking markers on a 0.4 M linkage group. Table 6 shows the power of the QTL detection, the RMSE of the estimated QTL location and the proportion of estimated QTL positions at a marker position when the true QTL location changed from 0 M to 0.2 M. When the true QTL location tended towards the flanking markers, the power went up and the proportion of QTL locations at a marker location increased. On the contrary, the RMSE went down. The KS test confirmed the difference between the distributions of the estimated QTL location when the true QTL location varied (pvalues < 2.2e -16). The MWW test, which compared the medians of error of the estimated QTL location, confirmed the increase in accuracy when the true QTL location tended towards a marker location.

An algorithm to adjust the location of QTL mapped on markers
Analytical developments in backcross type population and simulation study in outbred type population demonstrated that the estimated position of the QTL is  The markers were evenly distributed on a linkage group of 0.6 M. 2 Proportion of QTL located at a marker location for a population of 300 individuals.
biased towards marker location under some circumstances. On the other hand, the decomposition of the LRT according to the formula (4) allowed to identify a putative cause of this bias: the residual error ε 1 in the LRT both under "no QTL" and the "one QTL" hypotheses. Indeed, according to the decomposition of the LRT in the formula (4), if the QTL is not estimated at its true location, two residual errors may have generated the bias: ε 1 and ε 2 . When the estimated QTL position is at a marker location, argmax t ε 2 (t) has a uniform distribution but argmax t ε 1 (t) is more often estimated at a marker location than between markers. In such a situation, ε 1 is very likely to play a dominant role in the bias. On the contrary, when the estimated QTL location is not at a marker location, argmax t ε 1 (t) and argmax t ε 2 (t) are unknown for a given argmax t [ε 1 (t) + ε 2 (t)] error. Under these circumstances, it is impossible to predict the relative influence of ε 1 and ε 2 on the bias. On the basis of this observation, we propose an approach to describe the ε 1 (t) process and, consequently, adjust the estimated QTL position when a QTL co localizes with one marker, i.e. an approach to correct the "marker effect" on the bias of the estimated QTL location. 1. Obtain the vector which contains the LRT profile along the linkage group, calculated on the phenotypic data, say L 0 . L 0 is maximum at the location of the marker M.
2. Under the "no QTL" hypothesis, simulate phenotypes and obtain LRT profiles until to have 1000 profiles  The population structure was a mixture of full and half sib families for given numbers of sires (s), dams per sire (d), and progeny per dam (p). 2 Significance level for the LRT. 3 Power of the QTL detection at significance level a. In order to verify the validity of this proposal, simulations were carried out in R for a backcross type design of 100 progeny. There were six markers equally distributed on a 1 M linkage group. There was one QTL of 0.5σ of effect for which two true locations were envisaged: 0.1 M, i.e. between markers, and 0.2 M, i.e. on a marker. Table 7 shows the comparison of RMSE between "before" and "after" adjusting the estimated QTL location when the estimated QTL position was on one of the markers in the first place. The distributions of estimated QTL location before and after the adjustment of the estimated QTL location are presented in Figure 12. The RMSE was always smaller after the position had been adjusted, even when the true QTL location was in 0.2 M, i.e. on a marker location. In this example, the proportion of false QTL locations on the markers was effectively decreased by the proposed algorithm.

Discussion
In order to study the elements that give rise to the bias on the estimated QTL position, we checked whether the distribution of the test statistic changed along the locations on the linkage group. More precisely, we checked if the significance threshold remained the same at a marker and at a non-marker location. Under the null hypothesis of "no QTL on the linkage group", the asymptotic distribution of the LRT at a given point is well known and identical for all locations. It will be getting closer to the central c 2 distribution with a degree of freedom depending on the number of parameters fixed under the null hypothesis [20], i.e. here the number of sires or dams for which a QTL effect was estimated. Nevertheless, the population size is most often not large enough to make the LRT reach its asymptotic distribution for all the locations on the linkage group. The variability of the marker informativity along the linkage group may actually influence this convergence to asymptotic conditions, resulting in variability of the LRT distributions depending on the tested locations. Here, the differences between the empirical distributions of the LRT at each position along the linkage group were explored using a real example of an outbred type population. It appeared that the variability of  The markers were evenly located in a linkage group of 0.6 M. 2 Significance level for the LRT. 3 Power of the QTL detection at significance level a in a population of 300 progeny.
informativity along the linkage group did not lead to a significative variability of the empirical distributions of the nominal test statistics. This observation is not contradictive to the bias on the estimated QTL location towards the locations of markers but it means that the bias is due to the process which defines the sup of LRT on the linkage group. Some analytical results concerning the bias of the estimated QTL location were obtained in a backcross type population, i.e. a backcross between inbred lines. We identified a common source of bias under the "no QTL" and the "one QTL" hypotheses, and also showed the possible influence of several parameters under "one QTL" hypothesis, such as the population size, the marker density, the QTL effect and the true QTL location. Using simulations, we verified the existence of a bias on the estimation of the QTL location using the interval mapping method, under the null and the alternative hypotheses, when family structure are more complex than the backcross design considered by Walling et al. [4]. Simulations of outbred populations confirmed that this bias is influenced by the size of the population and the density of the genetic map, as well as by the QTL effect under the alternative hypothesis. We also demonstrated that the true QTL location, relatively to the flanking markers, had a significant impact on the accuracy of the estimated QTL location. Moreover, we quantified the bias of the estimated QTL location for various values of these parameters and validated the results by applying appropriate test statistics.
We showed that the population size does not affect the estimation of the QTL location under the null hypothesis. Under the alternative hypothesis, very similar values of RMSE or of proportion of QTL detected at marker locations were observed whatever a. On the other hand, a slight reduction in the bias seemed to be obtained when applying a < 0.01. However, the choice of a high significance level also implies a decrease of power and the detection of only few QTL. As a consequence, it cannot be considered an efficient way to correct the bias problem.
Considering these results leads to a first recommendation which would be that the number of animals and/or markers must be adjusted to the desired test power and location accuracy. Figure 13 summarizes the variability of the bias in accordance with the population size and the QTL effect. The proportion of QTL mapped at a marker location and the RMSE of the QTL location present the same tendencies according to the variations in QTL effect and in population size. This confirms that the bias on the estimated QTL location is essentially due to the location of QTL on markers.  Significance level for the LRT. 2 Power of the QTL detection for significance level a. 3 RMSE of the estimated QTL location in a population of 100 progeny with a true QTL located at 0.1 M.
Secondly, concerning the particular case of eQTL detection, when the marker information is relatively sparse, for example when microsatellite markers are used for genotyping, it is necessary to measure several hundred of animals for transcriptomic data to obtain an accurate eQTL location. Finally, a population size of 300 progeny seems to be a good compromise in the detection of eQTL, even if only those which have relatively large effects will be detected.
Thirdly, it is clear that significant QTL detection located at a marker position should be considered with caution, especially when the population size, the marker density or the QTL effect are low. Hence, the approach proposed above is efficient to remedy the bias on the estimated QTL location in such situation.

Conclusions
When we apply the interval mapping method on an outbred design to map QTL, the QTL is often incorrectly mapped at the position of a marker. In this work, this bias was studied by using analytical developments in backcross type population and simulated data in outbred populations. In the absence of QTL, adjusting the thresholds at the location of markers cannot reduce the bias, and the population size does not affect the bias. Under the hypothesis of having one QTL, the impact of some parameters on the bias was confirmed: when the population size and/or the QTL effect and/or the marker density are large enough, the bias is reduced. Moreover, the closer the QTL is to a marker location, the more accurate the estimation is. Therefore, caution should be taken when the QTL is mapped at a position of a marker, in particular for low power designs. In such cases, a method is proposed to correct the bias on the estimated QTL location. Simulations carried out in a backcross type population demonstrated that this method is valid to limit the bias.

Analyses on a real data set in pig
A real data set was used to illustrate some aspects of the present work. It is a porcine outbred population of 325 progeny issued from 4 sires. One example of eQTL analysis using the QTLMap software [21], i.e. the analysis of the chromosome SSC18 for 6 665 gene expression traits, was given. The linkage analysis method was applied according to Le Roy et al. [22] with a gene by gene procedure. For each gene, when the LRT was significant at the 5% 0 level at the chromosome level, the estimated eQTL location was the location where the LRT was maximum on the linkage group.
The same familial structure was used to study the empirical distribution of the LRT along the linkage  group. Two thousand simulations were performed under the null hypothesis of "no QTL" on the chromosome SSC1 which carried 16 microsatellite markers. A polygenic heritability coefficient of 0.5 was assumed for the trait (see http://www.inra.fr/qtlmap).

Simulations of an Ornstein-Uhlenbeck process
The asymptotic distribution of the LRT process at marker positions was shown as being the square of an OU process [16]. Let's X t denote the value of this OU process at the t location. X t could be described as: where W t denotes the Brownien movement. In a backcross type population, the mean of this process is 0 and the autocovariance is: cov(X t , X t' ) = e -2|t-t'| with t and t' in the Haldane distance unit [16].
To simulate this process, we considered a linkage group with mk markers. We generated mk independent random numbers z 0 , z 1 , ..., z mk from a normal distribution with mean 0 and variance 1 with the function rnormin R. We defined X 0 = z 0 . Then, a discrete analog of the OU process [23] was given by: .., mk, where τ denotes the spacing of two adjacent markers in Morgan. This sequence is a firstorder autoregressive sequence.

Simulations of outbred type population
The QTLMap software [21] was used to simulate and analyse the data sets. QTLMap allowed the simulation of complete experimental designs with pedigree, genetic map, genotypes and phenotypes http://www.inra.fr/ qtlmap. The population structure was a mixture of full and half sib families for given numbers of sires (s), of dams per sire (d) and of progeny per dam (p). Most often, 3 markers were equally distributed on a 0.4 M linkage group. Each marker had 6 alleles with equal frequencies in the parental population. The QTL was simulated at 0.1 M and all sires and dams were heterozygous for the QTL. The phenotypes of the progeny were simulated as follows: (5) y ijk is the phenotype of the progeny ijk of the sire i and of the dam ij. u i and u ijk denote the polygenic effects, of the sire i and of the dam ij respectively, which follow a normal distribution with mean 0 and variance  . a denotes the QTL allelic substitution effect and g ijk (t 0 ) is the genotypic value of ijk at the QTL location t 0 . g ijk takes value 1, 0 or -1 depending onthe QTL genotype, QQ, Qq or qq, respectively. e ijk is a random normal variable with mean 0 and variance σ 2 e . The variance within QTL genotype is σ 2 = 2σ 2 u + σ 2 e and a is expressed in σ unit. The heritability coefficient, equal to 4σ 2 u /σ 2 , was fixed at 0.25. For each of the cases studied, the results were based on 5 000 simulations, either under the null hypothesis (H 0 : there is no QTL segregating on the linkage group, i.e. a = 0) or under the alternative hypothesis (H 0 : there is one QTL segregating on the linkage group, i.e. most often a = 1σ). For each simulated dataset, the estimated QTL position was the location of the linkage group where the LRT was maximum.
To understand how the QTL effect affects the estimation of the QTL location, a population of 100 progeny was simulated with a QTL effect ranging from 0.5 σ to 4 σ.
Other simulations were performed in a population of 300 progeny. Firstly, to check the bias extent depending on the marker density, samples with 2, 3, 5, 7 or 11 markers equidistant in a linkage group of 0.

Criteria
In each of the cases studied, the empirical distribution of the estimated QTL location was obtained from the 5000 locations of maximum of LRT of the simulations. The proportion of simulations for which the QTL location was estimated at one marker position was retained to quantify the variability of the bias depending on the parameters. Under the alternative hypothesis, beside the proportion of the QTL which co-localized with a marker, the root mean squared error (RMSE) of the QTL location was chosen to describe the bias variability which depends on the parameters. What is more, the power of the QTL detection was calculated for a first type error a = 0.01 and 0.05. For all simulations and for simulations with a significant QTL at the level a = 0.01 or 0.05, the proportion of QTL that were estimated at a marker location and the RMSE of the estimated QTL location were computed. The RMSE computation was given by the following formula where L is the number of simulations (all, significant at the level 0.05 or at the level 0.01),t l is the l th estimated QTL position and t 0 is the true QTL position.

Hypothesis test
Appropriate statistical tests are needed to evaluate which parameters affect the bias of the estimated QTL position. ANOVA was not adequate to test the equality of the average QTL position in two different conditions (e.g. 2 population sizes) because of the non normality of the QTL position estimator. Therefore, two nonparametric tests were combined in order to test which parameters affect the bias, and how they influence the variation of the QTL location estimation. This was performed in two steps: (1) the parameters which influence the accuracy of the estimated QTL location were identified. This step was carried out with a Kolmogorov-Smirnov test; (2) for the parameters identified in the first step, a description of their effect on the accuracy of the estimated QTL position was made. This step was performed with a Mann-Whitney-Wilcoxon test [24].
1. Kolmogorov-Smirnov test (KS): this test was applied in order to check whether a parameter affected the estimation of the QTL position. For each value of the parameter, an empirical distribution of the estimated QTL location was obtained using 5 000 simulations. The two hypotheses compared by the KS test were: where F a , F b denote the distribution of the estimated QTL position under the conditions a and b, respectively. For a given parameter, all the distributions were compared by pairs with the function ks.test in R. If all pair comparisons concluded to accept the null hypothesis, it means that the value of this parameter did not influence the estimation of the QTL position.
Mann-Whitney-Wilcoxon test (MWW): when the null hypothesis in the first step was rejected, the MWW test was used to understand how the parameter affected the estimation with the function wilcox.test in R. The hypotheses compared were: where D a , D b denote the absolute values of the deviations between the estimated QTL position and the assumed, i.e. the true position, under the condition a and b, respectively. A smaller median D corresponds to a more accurate position estimation. of the first term, is reduced. In the same way, as a 2