Statistical properties of interval mapping methods on quantitative trait loci location: impact on QTL/eQTL analyses

Wang, Xiaoqiang; Gilbert, Hélène; Moreno, Carole; Filangi, Olivier; Elsen, Jean-Michel; Le Roy, Pascale

doi:10.1186/1471-2156-13-29

Methodology article
Open access
Published: 20 April 2012

Statistical properties of interval mapping methods on quantitative trait loci location: impact on QTL/eQTL analyses

Xiaoqiang Wang^1,2,
Hélène Gilbert^3,4,
Carole Moreno⁵,
Olivier Filangi^1,2,
Jean-Michel Elsen⁵ &
…
Pascale Le Roy^1,2

BMC Genetics volume 13, Article number: 29 (2012) Cite this article

5815 Accesses
2 Citations
1 Altmetric
Metrics details

Abstract

Background

Quantitative trait loci (QTL) detection on a huge amount of phenotypes, like eQTL detection on transcriptomic data, can be dramatically impaired by the statistical properties of interval mapping methods. One of these major outcomes is the high number of QTL detected at marker locations. The present study aims at identifying and specifying the sources of this bias, in particular in the case of analysis of data issued from outbred populations. Analytical developments were carried out in a backcross situation in order to specify the bias and to propose an algorithm to control it. The outbred population context was studied through simulated data sets in a wide range of situations.

The likelihood ratio test was firstly analyzed under the "one QTL" hypothesis in a backcross population. Designs of sib families were then simulated and analyzed using the QTL Map software. On the basis of the theoretical results in backcross, parameters such as the population size, the density of the genetic map, the QTL effect and the true location of the QTL, were taken into account under the "no QTL" and the "one QTL" hypotheses. A combination of two non parametric tests - the Kolmogorov-Smirnov test and the Mann-Whitney-Wilcoxon test - was used in order to identify the parameters that affected the bias and to specify how much they influenced the estimation of QTL location.

Results

A theoretical expression of the bias of the estimated QTL location was obtained for a backcross type population. We demonstrated a common source of bias under the "no QTL" and the "one QTL" hypotheses and qualified the possible influence of several parameters. Simulation studies confirmed that the bias exists in outbred populations under both the hypotheses of "no QTL" and "one QTL" on a linkage group. The QTL location was systematically closer to marker locations than expected, particularly in the case of low QTL effect, small population size or low density of markers, i.e. designs with low power. Practical recommendations for experimental designs for QTL detection in outbred populations are given on the basis of this bias quantification. Furthermore, an original algorithm is proposed to adjust the location of a QTL, obtained with interval mapping, which co located with a marker.

Conclusions

Therefore, one should be attentive when one QTL is mapped at the location of one marker, especially under low power conditions.

Background

For the last decade, several studies have shown that a large proportion of QTL are mapped at the markers locations whenever linkage analysis is applied. As to what regards dataset analyses, this bias first raised doubts in Spelman et al. [1], who observed a large proportion of significant test statistics at marker location when looking for QTL in five milk production traits. Walling et al. [2] have described the influence of markers in constructing the confidence intervals of QTL location and questioned whether QTL location was biased towards the location of markers instead of its true position. By applying the regression coefficients on the markers as suggested by Whittaker et al. [3], Walling et al. [4] calculated the proportion of putative QTL located at marker positions in a backcross population. They have reported a systematic bias for the estimated QTL position under the null hypothesis of the test, i.e. the hypothesis of no QTL on the linkage group. Moreover, results from linear regression methods for QTL detection have been reported to behave the same way the results from maximum likelihood methods in interval mapping approaches do [5]. The simulation studies by Walling et al. [4] have confirmed that these two approaches have similar biases on the estimated QTL position in a backcross population.

These previous works have shown that a bias on the QTL location occurs when genetic linkage analysis for QTL mapping is used in a backcross population. However, little research has been devoted to establishing which parameters give rise to that bias. What is more, no study has investigated how it affects linkage analysis applied to outbred populations. In order to address these shortcomings, the present study aims at identifying the sources of that bias, in particular regarding the analysis of data issued from outbred populations.

This question is of critical importance in expression quantitative trait loci (eQTL) mapping. Indeed, a main objective is often to search for eQTL which co localize with QTL which influences agronomical performances. The accuracy of the eQTL locations is thus a fundamental element for experimental design optimization, especially since experimental designs for gene expression analyses are generally of moderate size due to the cost of phenotyping. The pioneer work of eQTL detection can be traced back to the emergence of the concept of genetical genomics [6]. During the past decade, QTL mapping was widely applied to the detection of eQTL, for example in yeast [7], mice [8], human [9, 10], maize [11] and pig [12]. Generally, mapping procedures were used to map eQTL considering each transcript expression level as one quantitative trait in a trait by trait analysis.

Recently, we have carried out linkage analyses by interval mapping on high throughput transcriptomic data from several familial QTL detection designs, in pig [13], in poultry [14] and in trout [15]. Because of the high dimensionality of the phenotypes, these eQTL analyses have highlighted to the bias of interval mapping estimation of the QTL location. We observed that the number of eQTL detected at marker locations was consistently higher than between marker locations: for instance, with the analysis of 6 665 gene expressions in a population of 325 pigs, we found 756 eQTL on the chromosome 18 distributed as shown in Figure 1. It appeared that the eQTL were significantly more often mapped on marker locations rather than between marker locations.

Hence, in order to qualify this possible bias on the estimated QTL location in outbred populations and to specify which parameters influence it, this paper presents a study of the QTL location accuracy. Firstly, in order to make things more concrete, we explored the empirical distribution of the LRT along the linkage group under the null hypothesis of "no QTL" on a real dataset. Secondly, analytical developments were carried out so as to identify the parameters which influence the QTL location accuracy. Since they are impossible to realize for outbred populations, because of the test statistic complexity, a more simple case of a backcross type population, i.e. a backcross between inbred lines, was considered at that stage. Thirdly, designs of outbred sib families were simulated and analyzed in order to characterize the bias variability under the null and the alternative ("one QTL segregating on the linkage group") hypotheses. Such parameters as population size and marker density under the null hypothesis, as well as QTL effect and simulated QTL location under the alternative hypothesis were taken into account. Finally, an approach as to how to adjust the QTL location estimation, for a QTL located at the position of one marker, was suggested.

Results

The LRT distribution along the linkage group under the null hypothesis

By using the real pedigree and genotypes structure from an experimental design in pig, 2 000 simulations of phenotypes under the null hypothesis of "no QTL on the linkage group" were performed. The distribution of the estimated QTL location, i.e. the location of the maximum LRT, was obtained (Figure 2) on the chromosome SSC1 which carried 16 microsatellite markers (black points in the X axis). It clearly shows that a large proportion of QTL was found at a marker location. The histograms in Figure 3 show the empirical distributions of the 2000 LRT at some locations on the chromosome, both markers and non-markers. The LRT at each location followed a x² distribution, with degrees of freedom ranging from 4.05 to 4.55, according to a Kolmogorov-Smirnov (KS) test at α = 0.01. This was generalized to all tested positions on the linkage group in Figure 4, where the distributions of the LRT characterized by the number of degrees of freedom under the null hypothesis at markers was not very different from the distributions obtained for positions between markers.

It should be noted (Figure 2) that the proportion of QTL locations estimated at the two extreme marker positions of the linkage group was higher than for the other markers under the "no QTL" hypothesis. The asymptotic distribution of the LRT process at marker positions is known to be the square of an Ornstein-Uhlenbeck (OU) process as pointed out by Lander and Botstein [16] and proved by Cierco [17]. As indicated in Rabier et al. [18], when test statistics are performed only on markers, the OU process follows an autoregressive process of order 1. After 1 million of simulations for this process, we found that the probability for the maximum of the OU process to be on the bounds is higher than within the interval (Figure 5). This property of the OU process was consistent with the fact that we observed a large proportion of QTL localised at the extreme marker positions in comparison with the other markers.

The QTL location bias expression in a backcross population

In order to investigate the bias on the QTL location under the hypothesis of "one QTL", we considered a linkage group limited to an interval [0,T] between two markers M₁ (alleles M₁ and m₁) at 0 and M₂ (alleles M₂ and m₂) at T flanking a QTL (alleles Q and q) in a backcross population obtained from the cross M₁M₁QQM₂M₂ × M₁m₁QqM₂m₂.

Let y_k be the phenotypic value for the individual k = 1,...,n. Assuming a QTL located at the location t₀ , the genetic model for y_k is:

y_{k} = μ + \frac{a}{2} g_{k} (t_{0}) + e_{k},

(1)

where μ denotes the overall mean, a denotes the QTL allelic substitution effect, g_k(t₀) is the genotypic value of k at the QTL position, which takes 1 or -1 value depending on the QTL allele, Q or q respectively, received by k from its heterozygous parent and e_k is a random normal variable with mean 0 and variance σ². In order to simplify calculations, we set μ = 0 and σ² = 1 and used a linearized likelihood function instead of a mixture of two normal distributions (e.g. [19]). In this case, the interval mapping method is the same as the regression method for QTL detection, and the model (1) is modified as follows:

y_{k} = \frac{a}{2} x_{k} (t_{0}) + e_{k},

(2)

where x_k(t₀) = E[g_k(t₀)|M^k] and M^k denotes the genotype of the individual k at markers M₁ and M₂. Let θ_t-t' be the recombination rate in the distance |t - t'|, then for all k (see Appendix I):

\begin{matrix} x_{k} (t) = & \{\begin{matrix} (1 - θ_{t} - θ_{T - t}) / (1 - θ_{T}) & if M^{k} = M_{1} M_{1} M_{2} M_{2} \\ (θ_{T - t} - θ_{t}) / θ_{T} & if M^{k} = M_{1} M_{1} M_{2} m_{2} \\ (θ_{t} + θ_{T - t} - 1) / (1 - θ_{T}) & if M^{k} = M_{1} m_{1} M_{2} m_{2} \\ (θ_{T - t} - θ_{t}) / θ_{T} & if M^{k} = M_{1} m_{1} M_{2} M_{2} \end{matrix} \end{matrix}

and the LRT at the position t is calculated as

L R T (t) = \frac{{[\sum y_{k} x_{k} (t)]}^{2}}{\sum x_{k}^{2} (t)} .

(3)

Note that given a position t, {x_k(t)}_{k = 1, ..., n} is a sequence of i.i.d. random variables with the same expected value 0 and the same variance (see Appendix III), noted Var(x(t)). Hence, if we replace y_k by the model (2) in equation (3), then the LRT(t) consists of three terms as (see Appendix II):

L R T (t) = f (t) + ε_{1} (t) + ε_{2} (t),

where $f (t) = \frac{1}{4} a^{2} {[\sum x_{k} (t_{0}) x_{k} (t)]}^{2} / \sum x_{k}^{2} (t)$ is a function of variable t, the noise ε₁(t) is the LRT at t under the no QTL hypothesis and the noise ε₂(t) follows approximately a normal distribution with mean 0 and variance $n a^{2} V a r (x (t)) ρ_{t t_{0}}^{2} .$ Note that:

arg max_{t \in [0, T]} L R T (t) = arg max_{t \in [0, T]} \frac{1}{n} L R T (t),

i.e. the bias on LRT(t) behaves similarly to the bias on LRT(t)/n. So this property allows to analyze the source of the bias in LRT(t)/n instead of LRT(t). Using the law of large numbers, we have this following decomposition of LRT (t)/n:

\frac{1}{n} L R T (t) \approx \frac{1}{4} a^{2} V a r (x (t_{0})) ρ_{t t_{0}}^{2} + \frac{1}{n} ε_{1} (t) + \frac{1}{n} ε_{2} (t),

(4)

It can be seen from formula (4) that the first term reaches its maximum at t₀ since $ρ_{t t_{0}} \to 1$ when t→ t₀. As seen in the previous section, the second term, which is proportional to the LRT at t under the "no QTL" hypothesis, reaches its maximum at the position of markers more often than at the positions between markers. As a result, the estimated QTL location will be biased towards the position of markers. However, when n or a² increase, or when T decreases, or when t₀ approaches one marker location (see Appendix III), the deviation between the two first terms in formula (4) increases and the influence of the second term is reduced. Therefore, in our simple backcross population model, under the hypothesis of one QTL, when the population size, marker density, QTL effect increase or when the true QTL location approaches the position of a marker, the bias of the estimated QTL location is expected to be reduced.

Simulations under H0

According to the preceding results, the estimated QTL location cannot be expected to be uniformly distributed on the chromosome under the null hypothesis of no QTL. Familial designs were simulated to test the influence of the population size and the marker density on this bias in outbred populations.

Impact of the population size

Six different population sizes, from 60 to 800 progeny, were simulated with three markers located at 0 M, 0.2 M and 0.4 M on a 0.4 M linkage group. The empirical distributions of the estimated QTL location, obtained from 5 000 simulations for each population size, showed that the probability of mapping a QTL at a marker location was always higher than that of mapping it between markers (Figure 6). As shown in Table 1, the proportion of QTL which co-localized with the markers looked independent from the population size. When comparing the estimated QTL location distributions, which were obtained with the different population sizes, using the Kolmogorov-Smirnov test, large p-values were obtained (> 0.97). This indicates that the population size did not influence the distribution of the estimated QTL location in a significative way under the null hypothesis.

Table 1 Proportion of estimated QTL locations at marker locations according to the population size under H0

Full size table

Impact of the marker density

The empirical distributions of estimated QTL locations were compared for five marker, with 2, 3, 5, 7 and 11 markers equally distributed on a 0.6 M linkage group (Figure 7). The bias towards the marker positions was systematic whichever the marker density. Except for 11 markers, the number of markers had little influence on the proportion of QTL located at a marker position (Table 2), i.e. the bias seemed to divide up between markers. However, this criteria was difficult to interpret because the number of markers was different in each of the cases studied. Moreover, in all cases, the two extreme markers concentrated more false locations than intermediate markers did. Since the number of markers was different, the distributions based on the marker density were expected to be different as well. The KS test was thus not suitable to test the influence of the marker density.

Table 2 Proportion of estimated QTL locations at marker locations according to the marker density under H0

Full size table

Simulations under H1

According to the analytical results obtained in a backcross type population, the population size and the marker density, as well as the QTL effect and location, were parameters which were very likely to influence the bias on the estimated QTL location under the alternative hypothesis.

Impact of the population size

Three sizes of population were simulated: 100, 300 or 800 progeny. The empirical distributions of the estimated QTL location are given in Figure 8. One QTL was simulated at the 0.1 M location (Δ on the X axis) on a 0.4 M linkage group with three markers located at 0 M, 0.2 M and 0.4 M. The Figure 8 clearly shows a variability of the bias of the QTL location depending on the population size. It can be seen from Table 3 that the proportion of putative QTL that co-localized with a marker and the root mean square error (RMSE) of the QTL location became smaller when the population size increased. Table 3 also shows that the bias was correlated to the power of the analysis. However, considering only very significant LRT (α = 0.01), significant (α = 0.05) or all LRT (α = -), led to very similar values of RMSE or of proportion of QTL mapped on a marker location. The KS test indicated that the distributions of the estimated QTL position were significantly different depending on the size of the population studied (p - values = 0). Moreover, the Mann-Whitney-Wilcoxon (MWW) test showed that, when the population size increased, the median of the error of the estimated QTL position significantly decreased (p - values < 2.2e - 16).

Table 3 Power, RMSE of the QTL location and proportion of estimated QTL locations at marker locations according to the population size under H1

Full size table

Impact of the marker density

Figure 9 reports the QTL location distribution for a marker density from 2 to 11 markers equally distributed on a 0.6 M interval. One QTL was simulated at 0.25 M. It shows the advantage of using a high marker density in the QTL detection: when the marker spacing was minimal (i.e. 6 cM), the location of the QTL was very accurate. Table 4 shows that, when the density increased, the RMSE of the estimated QTL location decreased. So the QTL location was more accurately estimated. The dependency between the power of the analysis and the bias extent was confirmed. On the contrary, there were low variations of RMSE or proportion of QTL mapped at marker location according to α. Finally, as under H0, these tendencies were confirmed for the proportion of QTL located at a marker location, except for 11 markers.

Table 4 Power, RMSE of the QTL location and proportion of estimated QTL locations at marker locations according to the marker density under H1

Full size table

Impact of the QTL effect

The empirical distributions of the estimated QTL location when the QTL effect increased from 0.5 phenotypic standard deviation (σ) to 4σ is shown in Figure 10. One QTL was simulated at 0.1 M on a 0.4 M linkage group with three markers equally spaced. The power of the QTL detection, the RMSE of the estimated QTL location and the proportion of estimated QTL locations at a marker are given in Table 5. Results indicated that, whenever the QTL effect increased, the bias decreased. As seen previously, the bias decreased when the power increased but the RMSE or the proportion of QTL mapped on a marker position were only slightly dependent on the test level. The KS test indicated that the distribution of the estimated QTL position was significantly different when the QTL effect changed (p - values < 2.2e - 16). The MWW test showed that, when the QTL effect increased, the median of the error of the estimated QTL position decreased (p - values < 2.2e - 16).

Table 5 Power, RMSE of the QTL location and proportion of estimated QTL locations at marker locations according to the QTL effect under H1

Full size table

Impact of the true QTL location

Figure 11 shows the variation in the distribution of the estimated QTL location when the simulated QTL position (Δ on the X axis) moved towards the middle of two flanking markers on a 0.4 M linkage group. Table 6 shows the power of the QTL detection, the RMSE of the estimated QTL location and the proportion of estimated QTL positions at a marker position when the true QTL location changed from 0 M to 0.2 M. When the true QTL location tended towards the flanking markers, the power went up and the proportion of QTL locations at a marker location increased. On the contrary, the RMSE went down. The KS test confirmed the difference between the distributions of the estimated QTL location when the true QTL location varied (p - values < 2.2e - 16). The MWW test, which compared the medians of error of the estimated QTL location, confirmed the increase in accuracy when the true QTL location tended towards a marker location.

Table 6 Power, RMSE of the QTL location and proportion of estimated QTL locations at marker locations according to the true QTL location under H1

Full size table

An algorithm to adjust the location of QTL mapped on markers

Analytical developments in backcross type population and simulation study in outbred type population demonstrated that the estimated position of the QTL is biased towards marker location under some circumstances. On the other hand, the decomposition of the LRT according to the formula (4) allowed to identify a putative cause of this bias: the residual error ε₁ in the LRT both under "no QTL" and the "one QTL" hypotheses. Indeed, according to the decomposition of the LRT in the formula (4), if the QTL is not estimated at its true location, two residual errors may have generated the bias: ε₁ and ε₂. When the estimated QTL position is at a marker location, argmax_tε₂(t) has a uniform distribution but argmax_tε₁(t) is more often estimated at a marker location than between markers. In such a situation, ε₁ is very likely to play a dominant role in the bias. On the contrary, when the estimated QTL location is not at a marker location, argmax_tε₁(t) and argmax_tε₂(t) are unknown for a given argmax_t[ε₁(t) + ε₂(t)] error. Under these circumstances, it is impossible to predict the relative influence of ε₁ and ε₂ on the bias. On the basis of this observation, we propose an approach to describe the ε₁(t) process and, consequently, adjust the estimated QTL position when a QTL co localizes with one marker, i.e. an approach to correct the "marker effect" on the bias of the estimated QTL location.

1. Obtain the vector which contains the LRT profile along the linkage group, calculated on the phenotypic data, say L₀. L₀ is maximum at the location of the marker M.

2. Under the "no QTL" hypothesis, simulate phenotypes and obtain LRT profiles until to have 1000 profiles which have their maximum at the position of the marker M, say { L_i}_{i = 1,...,1000}

3. Calculate the 1 000 vectors V_i= L₀-L_i,i ϵ 1,...,1000.

4. Retain the 1000 locations where { V_i}_{i= 1, ..., 1000}is maximum.

5. Obtain the adjusted position of the QTL as the mean of these positions:

\hat{P} = Mean {p_{i} | p_{i} = argmax V_{i}, i = 1, \dots, 1000} .

In order to verify the validity of this proposal, simulations were carried out in R for a backcross type design of 100 progeny. There were six markers equally distributed on a 1 M linkage group. There was one QTL of 0.5σ of effect for which two true locations were envisaged: 0.1 M, i.e. between markers, and 0.2 M, i.e. on a marker. Table 7 shows the comparison of RMSE between "before" and "after" adjusting the estimated QTL location when the estimated QTL position was on one of the markers in the first place. The distributions of estimated QTL location before and after the adjustment of the estimated QTL location are presented in Figure 12. The RMSE was always smaller after the position had been adjusted, even when the true QTL location was in 0.2 M, i.e. on a marker location. In this example, the proportion of false QTL locations on the markers was effectively decreased by the proposed algorithm.

Table 7 RMSE of the QTL location before or after adjustment

Full size table

Discussion

In order to study the elements that give rise to the bias on the estimated QTL position, we checked whether the distribution of the test statistic changed along the locations on the linkage group. More precisely, we checked if the significance threshold remained the same at a marker and at a non-marker location. Under the null hypothesis of "no QTL on the linkage group", the asymptotic distribution of the LRT at a given point is well known and identical for all locations. It will be getting closer to the central χ² distribution with a degree of freedom depending on the number of parameters fixed under the null hypothesis [20], i.e. here the number of sires or dams for which a QTL effect was estimated. Nevertheless, the population size is most often not large enough to make the LRT reach its asymptotic distribution for all the locations on the linkage group. The variability of the marker informativity along the linkage group may actually influence this convergence to asymptotic conditions, resulting in variability of the LRT distributions depending on the tested locations. Here, the differences between the empirical distributions of the LRT at each position along the linkage group were explored using a real example of an outbred type population. It appeared that the variability of informativity along the linkage group did not lead to a significative variability of the empirical distributions of the nominal test statistics. This observation is not contradictive to the bias on the estimated QTL location towards the locations of markers but it means that the bias is due to the process which defines the sup of LRT on the linkage group.

Some analytical results concerning the bias of the estimated QTL location were obtained in a backcross type population, i.e. a backcross between inbred lines. We identified a common source of bias under the "no QTL" and the "one QTL" hypotheses, and also showed the possible influence of several parameters under "one QTL" hypothesis, such as the population size, the marker density, the QTL effect and the true QTL location. Using simulations, we verified the existence of a bias on the estimation of the QTL location using the interval mapping method, under the null and the alternative hypotheses, when family structure are more complex than the backcross design considered by Walling et al. [4]. Simulations of outbred populations confirmed that this bias is influenced by the size of the population and the density of the genetic map, as well as by the QTL effect under the alternative hypothesis. We also demonstrated that the true QTL location, relatively to the flanking markers, had a significant impact on the accuracy of the estimated QTL location. Moreover, we quantified the bias of the estimated QTL location for various values of these parameters and validated the results by applying appropriate test statistics.

We showed that the population size does not affect the estimation of the QTL location under the null hypothesis. Under the alternative hypothesis, very similar values of RMSE or of proportion of QTL detected at marker locations were observed whatever α. On the other hand, a slight reduction in the bias seemed to be obtained when applying α < 0.01. However, the choice of a high significance level also implies a decrease of power and the detection of only few QTL. As a consequence, it cannot be considered an efficient way to correct the bias problem.

Considering these results leads to a first recommendation which would be that the number of animals and/or markers must be adjusted to the desired test power and location accuracy. Figure 13 summarizes the variability of the bias in accordance with the population size and the QTL effect. The proportion of QTL mapped at a marker location and the RMSE of the QTL location present the same tendencies according to the variations in QTL effect and in population size. This confirms that the bias on the estimated QTL location is essentially due to the location of QTL on markers.

Secondly, concerning the particular case of eQTL detection, when the marker information is relatively sparse, for example when microsatellite markers are used for genotyping, it is necessary to measure several hundred of animals for transcriptomic data to obtain an accurate eQTL location. Finally, a population size of 300 progeny seems to be a good compromise in the detection of eQTL, even if only those which have relatively large effects will be detected.

Thirdly, it is clear that significant QTL detection located at a marker position should be considered with caution, especially when the population size, the marker density or the QTL effect are low. Hence, the approach proposed above is efficient to remedy the bias on the estimated QTL location in such situation.

Conclusions

When we apply the interval mapping method on an outbred design to map QTL, the QTL is often incorrectly mapped at the position of a marker. In this work, this bias was studied by using analytical developments in backcross type population and simulated data in outbred populations. In the absence of QTL, adjusting the thresholds at the location of markers cannot reduce the bias, and the population size does not affect the bias. Under the hypothesis of having one QTL, the impact of some parameters on the bias was confirmed: when the population size and/or the QTL effect and/or the marker density are large enough, the bias is reduced. Moreover, the closer the QTL is to a marker location, the more accurate the estimation is. Therefore, caution should be taken when the QTL is mapped at a position of a marker, in particular for low power designs. In such cases, a method is proposed to correct the bias on the estimated QTL location. Simulations carried out in a backcross type population demonstrated that this method is valid to limit the bias.

Methods

Analyses on a real data set in pig

A real data set was used to illustrate some aspects of the present work. It is a porcine outbred population of 325 progeny issued from 4 sires. One example of eQTL analysis using the QTLMap software [21], i.e. the analysis of the chromosome SSC18 for 6 665 gene expression traits, was given. The linkage analysis method was applied according to Le Roy et al. [22] with a gene by gene procedure. For each gene, when the LRT was significant at the 5%₀ level at the chromosome level, the estimated eQTL location was the location where the LRT was maximum on the linkage group.

The same familial structure was used to study the empirical distribution of the LRT along the linkage group. Two thousand simulations were performed under the null hypothesis of "no QTL" on the chromosome SSC1 which carried 16 microsatellite markers. A polygenic heritability coefficient of 0.5 was assumed for the trait (see http://www.inra.fr/qtlmap).

Simulations of an Ornstein-Uhlenbeck process

The asymptotic distribution of the LRT process at marker positions was shown as being the square of an OU process [16]. Let's X_t denote the value of this OU process at the t location. X_t could be described as:

d X_{t} = 2 X_{t} d t + 2 d W_{t}

where W_t denotes the Brownien movement.

In a backcross type population, the mean of this process is 0 and the autocovariance is: cov(X_t, X_t') = e^-2|t-t'|with t and t' in the Haldane distance unit [16].

To simulate this process, we considered a linkage group with mk markers. We generated mk independent random numbers z₀, z₁, ..., z_mkfrom a normal distribution with mean 0 and variance 1 with the function rnormin R. We defined X₀ = z₀. Then, a discrete analog of the OU process [23] was given by:

X_{s} = e^{- 2 τ} X_{s - 1} + \sqrt{1 - e^{- 4 τ}} z_{s}

with s = 1, ..., mk, where τ denotes the spacing of two adjacent markers in Morgan. This sequence is a first-order autoregressive sequence.

Simulations of outbred type population

The QTLMap software [21] was used to simulate and analyse the data sets. QTLMap allowed the simulation of complete experimental designs with pedigree, genetic map, genotypes and phenotypes http://www.inra.fr/qtlmap. The population structure was a mixture of full and half sib families for given numbers of sires (s), of dams per sire (d) and of progeny per dam (p). Most often, 3 markers were equally distributed on a 0.4 M linkage group. Each marker had 6 alleles with equal frequencies in the parental population. The QTL was simulated at 0.1 M and all sires and dams were heterozygous for the QTL. The phenotypes of the progeny were simulated as follows:

y_{i j k} = u_{i} + u_{i j} + a g_{i j k} (t_{0}) + e_{i j k}

(5)

y_ijk is the phenotype of the progeny ijk of the sire i and of the dam ij. u_i and u_ijk denote the polygenic effects, of the sire i and of the dam ij respectively, which follow a normal distribution with mean 0 and variance $σ_{u}^{2}$ . a denotes the QTL allelic substitution effect and g_ijk (t₀) is the genotypic value of ijk at the QTL location t₀. g_ijk takes value 1, 0 or -1 depending onthe QTL genotype, QQ, Qq or qq, respectively. e_ijk is a random normal variable with mean 0 and variance $σ_{e}^{2}$ . The variance within QTL genotype is $σ^{2} = 2 σ_{u}^{2} + σ_{e}^{2}$ and a is expressed in σ unit. The heritability coefficient, equal to $4 σ_{u}^{2} / σ^{2}$ , was fixed at 0.25.

For each of the cases studied, the results were based on 5 000 simulations, either under the null hypothesis (H₀: there is no QTL segregating on the linkage group, i.e. a = 0) or under the alternative hypothesis (H₀: there is one QTL segregating on the linkage group, i.e. most often a = 1σ). For each simulated dataset, the estimated QTL position was the location of the linkage group where the LRT was maximum.

Under the null hypothesis, simulations were carried out so as to compare the influence of the population size with 6 levels: 60 (3s,1d,20p), 80 (4s,1d,20p), 100 (5s,1d,20p), 300 (5s,2d,30p), 400 (5s,2d,40p), 800 (5s,4d,40p) progeny. Under the H1 hypothesis, only 3 of these population sizes were considered: 100, 300 and 800 progeny.

To understand how the QTL effect affects the estimation of the QTL location, a population of 100 progeny was simulated with a QTL effect ranging from 0.5 σ to 4 σ.

Other simulations were performed in a population of 300 progeny. Firstly, to check the bias extent depending on the marker density, samples with 2, 3, 5, 7 or 11 markers equidistant in a linkage group of 0.6 M were simulated, under the null and under the alternative hypotheses. Under H₁, one QTL was simulated at 0.25 M (a = 1σ). Secondly, to test how the true QTL location may affect the bias, we performed simulations under H₁ with a QTL (a = 1σ) lying at 0 M, 0.05 M, 0.10 M, 0.15 M, 0.2 M on a linkage group of 0.4 M with two flanking markers at 0 M and 0.4 M.

Criteria

In each of the cases studied, the empirical distribution of the estimated QTL location was obtained from the 5000 locations of maximum of LRT of the simulations. The proportion of simulations for which the QTL location was estimated at one marker position was retained to quantify the variability of the bias depending on the parameters. Under the alternative hypothesis, beside the proportion of the QTL which co-localized with a marker, the root mean squared error (RMSE) of the QTL location was chosen to describe the bias variability which depends on the parameters. What is more, the power of the QTL detection was calculated for a first type error α = 0.01 and 0.05. For all simulations and for simulations with a significant QTL at the level α = 0.01 or 0.05, the proportion of QTL that were estimated at a marker location and the RMSE of the estimated QTL location were computed. The RMSE computation was given by the following formula

RMSE = \sqrt{\frac{1}{L} \sum_{l = 1}^{L} {({\hat{t}}_{l} - t_{0})}^{2}},

where L is the number of simulations (all, significant at the level 0.05 or at the level 0.01), ${\hat{t}}_{l}$ is the l^th estimated QTL position and t₀ is the true QTL position.

Hypothesis test

Appropriate statistical tests are needed to evaluate which parameters affect the bias of the estimated QTL position. ANOVA was not adequate to test the equality of the average QTL position in two different conditions (e.g. 2 population sizes) because of the non normality of the QTL position estimator. Therefore, two nonparametric tests were combined in order to test which parameters affect the bias, and how they influence the variation of the QTL location estimation. This was performed in two steps: (1) the parameters which influence the accuracy of the estimated QTL location were identified. This step was carried out with a Kolmogorov-Smirnov test; (2) for the parameters identified in the first step, a description of their effect on the accuracy of the estimated QTL position was made. This step was performed with a Mann-Whitney-Wilcoxon test [24].

1. Kolmogorov-Smirnov test (KS): this test was applied in order to check whether a parameter affected the estimation of the QTL position. For each value of the parameter, an empirical distribution of the estimated QTL location was obtained using 5 000 simulations. The two hypotheses compared by the KS test were:

H 0 : F_{a} = F_{b} H 1 : F_{a} \neq F_{b},

where F_a, F_b denote the distribution of the estimated QTL position under the conditions a and b, respectively. For a given parameter, all the distributions were compared by pairs with the function ks.test in R. If all pair comparisons concluded to accept the null hypothesis, it means that the value of this parameter did not influence the estimation of the QTL position.

Mann-Whitney-Wilcoxon test (MWW): when the null hypothesis in the first step was rejected, the MWW test was used to understand how the parameter affected the estimation with the function wilcox.test in R. The hypotheses compared were:

H 0 : D_{a} = D_{b} H 1 : D_{a} \leq D_{b},

where D_a, D_b denote the absolute values of the deviations between the estimated QTL position and the assumed, i.e. the true position, under the condition a and b, respectively. A smaller median D corresponds to a more accurate position estimation.

Appendix

Appendix I

Let us denote M^k the genotype of the markers at 0 and T for individual k and p_kt = ℙ (g_k(t) = 1|M^k). Then using a linearized likelihood function instead of the mixture of two normal distributions, the LRT can be written as: $\begin{matrix} L R T (t) & = - 2 \ln \frac{L (y_{1}, \dots, y_{n})}{\max_{a} L (y_{1}, \dots, y_{n}; a)} \\ \approx - 2 \ln \frac{\prod ϕ (y_{k}; 0, 1)}{\max_{a} \prod ϕ (y_{k} + \frac{a}{2} (1 - 2 p_{k t}); 0, 1)} \\ = \frac{{[\sum y_{k} (1 - 2 p_{k t})]}^{2}}{\sum {(1 - 2 p_{k t})}^{2}}, \end{matrix}$

where $ϕ (x; μ, σ) = \frac{1}{\sqrt{2 π} σ} e^{- \frac{{(x - μ)}^{2}}{2 σ^{2}}}$ and the distribution of p_kt is given as:

Table 8

Full size table

Note that x_k(t) = E(g_k(t)|M^k) = 2p_kt - 1, so the LRT at each position t can be described as $L R T (t) = \frac{{[\sum y_{k} x_{k} (t)]}^{2}}{\sum x_{k}^{2} (t)},$ and the distribution of x_k (t) for each individual k is

Table 9

Full size table

Appendix II

Replacing $y_{k} = \frac{a}{2} x_{k} (t_{0}) + ε_{k}$ in LRT (3), we have $\begin{matrix} L R T (t) & = \frac{{[\sum (\frac{a}{2} x_{k} (t_{0}) + ε_{k}) x_{k} (t)]}^{2}}{\sum x_{k}^{2} (t)} \\ = \frac{1}{4} a^{2} \frac{{[\sum x_{k} (t_{0}) x_{k} (t)]}^{2}}{\sum x_{k}^{2} (t)} + \frac{{[\sum ε_{k} x_{k} (t)]}^{2}}{\sum x_{k}^{2} (t)} + a \frac{\sum x_{k} (t_{0}) x_{k} (t)}{\sum x_{k}^{2} (t)} \sum ε_{k} x_{k} (t) \\ = f (t) + ε_{1} (t) + ε_{2} (t), \end{matrix}$

where in the case of large sample, we have

• $f (t) = \frac{1}{4} a^{2} \frac{{[\sum x_{k} (t_{0}) x_{k} (t)]}^{2}}{\sum x_{k}^{2} (t)}$ and $f (t) / n \to \frac{1}{4} a^{2} V a r (x (t_{0})) ρ_{t t_{0}}^{2}$ when n → ∞ according to the law of large numbers.

• $ε_{1} (t) = \frac{{[\sum ε_{k} x_{k} (t)]}^{2}}{\sum x_{k}^{2} (t)}$ is the LRT under the no QTL hypothesis.

• $ε_{2} (t) = a \frac{\sum x_{k} (t_{0}) x_{k} (t)}{\sum x_{k}^{2} (t)} \sum ε_{k} x_{k} (t) ~ a \frac{C o v (x (t), x (t_{0}))}{V a r (x (t))} \sum ε_{k} x_{k} (t)$ is a residual error, linear combination of gaussian random variables. Its distribution is approximated as $N (0, n a^{2} V a r (x (t)) ρ_{t t_{0}}^{2}) .$

Appendix III

Considering the two first terms in the expression of $\frac{1}{n} L R T (t)$ (4), when n tends to infinity, $\frac{1}{n} ε_{1} (t)$ will converge to 0 at each position t. So the amplitude of the curve representing the term $\frac{1}{n} ε_{1} (t)$ , with respect to that of the first term, is reduced. In the same way, as a² increases, the amplitude of the first term becomes larger with respect to that of the second term.

The proof of the influence of t₀ and T will use the results in this following lemma:

Lemma 1. Given two markers at the location 0 and T in a linkage group of length T and assuming a QTL located at t₀, from the distribution of Var(x(t)) and applying the Taylor series expansion in case of small T, we have:

1. $V a r (x (t_{0})) = \frac{{(1 - θ_{t_{0}} - θ_{T - t_{0}})}^{2}}{1 - θ_{T}} + \frac{{(θ_{t_{0}} - θ_{T - t_{0}})}^{2}}{θ_{T}} \approx \frac{4}{T} t_{0}^{2} - 4 t_{0} + 1 .$

2. $C o v (x (t), x (t_{0})) = \frac{(1 - θ_{t} - θ_{T - t}) (1 - θ_{t_{0}} - θ_{T - t_{0}})}{1 - θ_{T}} + \frac{(θ_{t} - θ_{T - t}) (θ_{t_{0}} - θ_{T - t_{0}})}{θ_{T}} \approx \frac{4}{T} t t_{0} - 2 (t + t_{0}) + 1 .$

Now let us assume T < 1 and consider the peak-to-peak amplitude of the first term of $\frac{1}{n} L R T (t)$ , $g (t) = \frac{1}{4} a^{2} V a r (x (t_{0})) ρ_{t t_{0}}^{2}$ , i.e., the deviation between highest amplitude value and lowest amplitude value:

δ = max_{t \in [0, T]} g (t) - min_{t \in [0, T]} g (t) .

If $t_{0} \in [\frac{T}{2}, T]$ , then the max_{t∈[0, T]}g(t) is reached at t = t₀ and the min_{t∈[0, T]}g(t) is reached at 0. Hence, we have:

\begin{matrix} δ & = \frac{a^{2}}{4} [V a r (x (t_{0})) - C o v^{2} (x (0), x (t_{0}))] \\ \approx a^{2} (\frac{1}{T} - 1) t_{0}^{2} . \end{matrix}

It can be seen that when t₀ → T from $\frac{T}{2}$ and/or when T decreases, δ will become larger.

If $t_{0} \in [0, \frac{T}{2}]$ , then the max_{t∈[0, T]}g(t) is reached at t = t₀ and the min_{t∈[0, T]}g(t) is reached at T. Hence, we have:

\begin{matrix} δ & = \frac{a^{2}}{4} [V a r (x (t_{0})) - C o v^{2} (x (T), x (t_{0}))] \\ \approx a^{2} (\frac{1}{T} - 1) {(t_{0} - T)}^{2} . \end{matrix}

It can be seen that when t₀ → 0 from $\frac{T}{2}$ , δ will become larger. Likewise, we set a constant c for the distance between the argmax_{t∈[0, T]}g(t) and argmin_{t∈[0, T]}g(t). Then, when T changes, the position of QTL is t₀ = T -c and

δ \approx a^{2} (\frac{1}{T} - 1) c^{2} .

Therefore, when T becomes larger, δ decreases.

In conclusion, the amplitude of g(t) will be greater as the QTL position tends to one marker and/or the distance between the markers decreases.

References

Spelman RJ, Coppieters W, Karim L, van Arendonk J, Bovenhuis H: Quantitative trait loci for five milk production traits on chromosome six in the Dutch Holstein-Friesian population. Genetics. 1996, 144: 1799-1808.
PubMed Central CAS PubMed Google Scholar
Walling GA, Visscher PM, Haley CS: A comparison of bootstrap methods to construct confidence intervals in QTL mapping. Genetical Research. 1998, 71: 171-180. 10.1017/S0016672398003164.
Article Google Scholar
Whittaker JC, Thompson R, Visscher PM: On the mapping of QTL by regression of phenotype on marker-type. Heredity. 1996, 77: 23-32. 10.1038/hdy.1996.104.
Article Google Scholar
Walling GA, Haley CS, Perez-Enciso M, Thompson R, Visscher PM: On the mapping of quantitative trait loci at marker and non-marker locations. Genetical Research. 2001, 79: 97-106.
Google Scholar
Perez-Enciso M, Fernando RL, Bidanel JP, Le Roy P: Quantitative Trait Locus analysis in crosses between outbred lines with dominance and inbreeding. Genetics. 2001, 159: 413-422.
PubMed Central CAS PubMed Google Scholar
Jansen RC, Nap JP: Genetical genomics: the added value from segregation. TRENDS in Genetics. 2001, 17: 388-391. 10.1016/S0168-9525(01)02310-1.
Article CAS PubMed Google Scholar
Brem RB, Yvert G, Clinto R, Kruglyak L: Genetic Dissection of Transcriptional Regulation in Budding Yeast. Science. 2002, 296: 752-755. 10.1126/science.1069516.
Article CAS PubMed Google Scholar
Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, Linsley PS, Mao M, Stoughton RB, Friend SH: Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003, 422: 297-302. 10.1038/nature01434.
Article CAS PubMed Google Scholar
Monks SA, Leonardson A, Zhu H, Cundiff P, Pietrusiak P, Edwards S, Phillips JW, Sachs A, Schadt EE: Genetic inheritance of gene expression in human cell lines. Am J Hum Genet. 2004, 75: 1094-1105. 10.1086/426461.
Article PubMed Central CAS PubMed Google Scholar
Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, Spielman RS, Cheung VG: Genetic analysis of genome-wide variation in human gene expression. Nature. 2004, 430: 743-747. 10.1038/nature02797.
Article PubMed Central CAS PubMed Google Scholar
Shi C, Uzarowska A, Ouzunova M, Landbeck M, Wenzel G, Lubberstedt T: Identification of candidate genes associated with cell wall digestibility and eQTL (expression quantitative trait loci) analysis in a Flint × Flint maize recombinant inbred line population. BMC Genomics. 2007, 8: 22-10.1186/1471-2164-8-22.
Article PubMed Central PubMed Google Scholar
Ponsuksili S, Murani E, Schwerin M, Schellander K, Wimmers K: Identification of expression QTL (eQTL) of genes expressed in porcine M. longissimus dorsi and associated with meat quality traits. BMC Genomics. 2010, 11: 572-10.1186/1471-2164-11-572.
Article PubMed Central PubMed Google Scholar
Cherel P, Glenisson J, Damon M, Vincent A, Liaubet L, Lobjois V, Hatey F, Milan D, Le Roy P: Colocalization of quantitatitve trait loci for meat pH and differentially expressed genes in skeletal muscles in pigs. Proceedings of the 8th World Congress on Genetics Applied to Livestock Production: 13-18 August 2009; Belo Horizonte, MG, Brazil. 2006, 06:18
Google Scholar
Le Mignon G, Desert C, Pitel F, Leroux S, Demeure O, Guernec G, Abasht B, Douaire M, Le Roy P, Lagarrigue S: Using transcriptome profiling to characterize QTL regions on chicken chromosome 5. BMC Genomics. 2009, 8: 22-
Google Scholar
Le Bras Y, Dechamp N, Montfort J, Cam AL, Krieg F, Quillet E, Prunet P, Le Roy P: Acclimation to seawater in rainbow trout: QTL/eQTL approach for plasmatic ions and gill tissue. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production: 1-6 August 2010; Leipzig, Germany. 2010, 638-
Google Scholar
Lander E, Botstein D: Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics. 1989, 121: 185-199.
PubMed Central CAS PubMed Google Scholar
Cierco C: Asymptotic distribution of the maximum likelihood ratio test for gene detection. Statistics. 1998, 31: 261-285. 10.1080/02331889808802639.
Article Google Scholar
Rabier CE, Azais JM, Delmas C: Likelihood ratio test process for quantitative trait loci detection. Journal of the Royal Statistical Society. 2009
Google Scholar
Elsen JM, Mangin B, Goffinet B, Boichard D, Le Roy P: Alternative models for QTL detection in livestock: I. General introduction. Genetics Selection Evolution. 1999, 31: 213-224. 10.1186/1297-9686-31-3-213.
Article Google Scholar
Goffinet B, Rebai A, Mangin B: Construction confidence intervals for QTL location. Genetics. 1994, 138: 1301-1308.
PubMed Central PubMed Google Scholar
Filangi O, Moreno C, Gilbert H, Legarra A, Le Roy P, Elsen JM: QTLMap, a software for QTL detection in outbred populations. Proceedings of the 9th World Congress on Genetics Applied to Livestock Production: 1-6 August 2010; Leipzig, Germany. 2010, 787-
Google Scholar
Le Roy P, Elsen JM, Boichard D, Mangin B, Bidanel JP, Goffinet B: An algorithm for QTL detection in mixture full and half sib families. Proceedings of the 6th World Congress on Genetics Applied to Livestock Production: 11-16 January 1998; Armidale, Australia. 1998, 257-260.
Google Scholar
Feller W: An Introduction to Probability Theory and Its Applications. 1968, Wiley, Volume 2: 2
Google Scholar
Lehmann EL: Nonparametrics: Statistical Methods Based on Ranks. 1975, Mcgraw-Hill
Google Scholar

Download references

Acknowledgements

These results are part of the SABRE research project that has been co-financed by the European Commission, within the 6th Framework Programme, contract No. FOOD-CT-2006-016250. XW is a Ph.D fellow supported by the SABRE research project and by the Animal Genetics division of INRA.

Author information

Authors and Affiliations

INRA, UMR 1348, PEGASE, 35042, Rennes cedex, France
Xiaoqiang Wang, Olivier Filangi & Pascale Le Roy
Agrocampus Ouest, UMR 1348, PEGASE, 35042, Rennes cedex, France
Xiaoqiang Wang, Olivier Filangi & Pascale Le Roy
INRA, UMR 1313, GABI, 78352, Jouy en Josas cedex, France
Hélène Gilbert
INRA, UMR 444, LGC, 31326, Castanet Tolosan cedex, France
Hélène Gilbert
INRA, UMR 631, SAGA, 31326, Castanet Tolosan cedex, France
Carole Moreno & Jean-Michel Elsen

Authors

Xiaoqiang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hélène Gilbert
View author publications
You can also search for this author in PubMed Google Scholar
Carole Moreno
View author publications
You can also search for this author in PubMed Google Scholar
Olivier Filangi
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Michel Elsen
View author publications
You can also search for this author in PubMed Google Scholar
Pascale Le Roy
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pascale Le Roy.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

PLR conceived this study. XW carried out the study and drafted the manuscript, supervised by PLR and JME. XW, HG, CM and OF wrote updated versions of the QTLMap software. All authors read and revised and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Wang, X., Gilbert, H., Moreno, C. et al. Statistical properties of interval mapping methods on quantitative trait loci location: impact on QTL/eQTL analyses. BMC Genet 13, 29 (2012). https://doi.org/10.1186/1471-2156-13-29

Download citation

Received: 18 September 2011
Accepted: 20 April 2012
Published: 20 April 2012
DOI: https://doi.org/10.1186/1471-2156-13-29

Statistical properties of interval mapping methods on quantitative trait loci location: impact on QTL/eQTL analyses

Abstract

Background

Results

Conclusions

Background

Results

The LRT distribution along the linkage group under the null hypothesis

The QTL location bias expression in a backcross population

Simulations under H0

Impact of the population size

Impact of the marker density

Simulations under H1

Impact of the population size

Impact of the marker density

Impact of the QTL effect

Impact of the true QTL location

An algorithm to adjust the location of QTL mapped on markers

Discussion

Conclusions

Methods

Analyses on a real data set in pig

Simulations of an Ornstein-Uhlenbeck process

Simulations of outbred type population

Criteria

Hypothesis test

Appendix

Appendix I

Appendix II

Appendix III

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Competing interests

Authors' contributions

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Genomic Data

Contact us