 Methodology article
 Open Access
 Published:
Influence of genotyping error in linkage mapping for complex traits – an analytic study
BMC Genetics volume 9, Article number: 57 (2008)
Abstract
Background
Despite the current trend towards large epidemiological studies of unrelated individuals, linkage studies in families are still thoroughly being utilized as tools for disease gene mapping. The use of the singlenucleotidepolymorphisms (SNP) array technology in genotyping of family data has the potential to provide more informative linkage data. Nevertheless, SNP array data are not immune to genotyping error which, as has been suggested in the past, could dramatically affect the evidence for linkage especially in selective designs such as affected sib pair (ASP) designs. The influence of genotyping error on selective designs for continuous traits has not been assessed yet.
Results
We use the identitybydescent (IBD) regressionbased paradigm for linkage testing to analytically quantify the effect of simple genotyping error models under specific selection schemes for sibling pairs. We show, for example, that in extremely concordant (EC) designs, genotyping error leads to decreased power whereas it leads to increased type I error in extremely discordant (ED) designs. Perhaps surprisingly, the effect of genotyping error on inference is most severe in designs where selection is least extreme. We suggest a genomic control for genotyping errors via a simple modification of the intercept in the regression for linkage.
Conclusion
This study extends earlier findings: genotyping error can substantially affect type I error and power in selective designs for continuous traits. Designs involving both EC and ED sib pairs are fairly immune to genotyping error. When those designs are not feasible the simple genomic control strategy that we suggest offers the potential to deliver more robust inference, especially if genotyping is carried out by SNP array technology.
Background
Linkage analysis of family data have been extensively used in the past in the search for genetic determinants. Nowadays, investigators favor large epidemiological studies of unrelated individuals, however several family datasets are currently being reanalyzed and/or pooled (e.g. [1]). The persistance of interest for linkage is partly triggered by the advent of singlenucleotidepolymorphisms (SNP) array genotyping technology in the field, indeed SNP arrays hold the promise of more reliable linkage maps [2, 3]. Although less prone to genotyping error than microsatellites when viewed as singlepoint markers, SNP arrays heavily rely on multipoint algorithms for accurate determination of the identical by descent (IBD) status of alleles. The gain in singlepoint reliability might therefore be annihilated by the propagation of errors across the many SNPs required to infer IBD status.
In the search for genetic determinants of complex traits by linkage, the use of selective designs appears to be an efficient way to gain adequate power for detection of typically small gene effects. A few authors have shown by simulation that the impact of genotyping error on evidence for linkage could be particularly severe in affected sibpair (ASP) designs [4–6], virtually masking most of the evidence for linkage. The impact of error on quantitative traits appears to be less dramatic in random samples, however it is unclear whether the same dramatic power losses hold in selected samples.
A method of choice is now emerging for the analysis of quantitative traits arising from selected sib pairs. This method is essentially a regression through the origin of excess identical by descent (IBD) sharing on a function of the trait value, whose slope is an estimate of the linkage parameter. It was first proposed by Sham et al. [7] and turns out to be equivalent to a score test [8]. In a numerical comparison of methods for selected samples, Skatkiewicz et al. [9] and Cuenco et al. [10] showed that this method had good properties in finite samples for extreme proband ascertained sibpair and discordant sibpair designs. By use of simple genotyping error models (population frequency error model and false homozygosity model), we show analytically what effects such error generating processes (occurring at rate ϵ per sib pair) induce for an idealized fully informative marker. It is shown that it results in a reduction of the slope estimate (i.e. of the estimated linkage parameter) by a factor 1  $\frac{\u03f5}{2}$ whether sib pairs are selected or not. Since the genotyping error rate ϵ is typically small, the previous effect on the linkage test is minimal. In addition to this slope effect, the regression's intercept is modified and this may have a much more sizable effect on the test for linkage depending on the sampling scheme used to select sib pairs. Surprisingly, this simple result allows us to predict that in extremely concordant (EC) sib pairs designs and in ASP designs, the effect of genotyping error will be milder as the selection becomes more extreme. In extreme discordant (ED) designs, the effect can in theory be either increased type I error or decreased power depending on the definition of discordance, the genotyping error rate and the true linkage effect; in practice however, for small quantitative trait locus (QTL) effects, the result will be an increased type I error. We argue that the basic error generating mechanisms assumed provide reasonable approximations of reallife situations. In the next section, we first describe some common errorgenerating processes and quantify their effect on IBD sharing in an idealized situation where marker information is complete. We then briefly sketch the inverse regression approach to linkage, we show analytically what the effect of genotyping error is on this regression and quantify the subsequent bias, power and type I error in common selective designs. We argue that under certain assumptions regarding the error model, one can easily implement a linkage test that incorporates a genomic control for genotyping error. Finally, we discuss some assumptions made in our study and the practical relevance of our findings. In particular, we argue that our results generalize to situations where marker information is incomplete and that the smaller error rates observed in SNP chip array compared to microsatellites offer no protection against bias in analysis.
Results
Genotyping error models
We consider two mechanisms for the generation of errors in marker data, namely the population frequency error model and the false homozygosity model. In those two models, we consider a single marker with m alleles and further assume that a maximum of one allelic error per sib pair can be made and that this happens with probability ϵ. This restriction to 'one error per sib pair' is just a first order approximation, for small ϵ, of a process where all four alleles would be allowed to be independently erroneous and does not restrict the generalizability of our results.
The population frequency error model reassigns the erroneous allele (chosen at random among the four forming the sibpair genotype) to one of the possible m alleles with probability equal to population allele frequency. One mathematical advantage of this model is that the marginal distribution of alleles and genotypes is unaltered. The false homozygosity model keeps homozygotes unchanged but reassigns heterozygotes to homozygotes with alleles equal to one of the two original alleles chosen according to probabilities proportional to population allele frequencies.
To our knowledge, false homozygosity is a common type of error: fairly rare alleles go unreported in samples. The population frequency error model provides an approximation to a process whereby alleles are misread. Errors at the two alleles of a marker's genotype might be correlated, we do not consider this type of process in details here although the effect on linkage will be qualitatively the same as in the two other models. We refer the reader to Sobel et al. [11] for a detailed exposé on genotyping error mechanisms. Note that the two models that we have chosen have been used in the past in order to identify potential genotyping errors [4, 11].
Impact on IBD sharing
Let's denote by π the proportion of alleles shared identical by descent (IBD) at a certain locus by two siblings. Tests for linkage are based on the IBD sharing distribution and although errors as described earlier are made at the genotype level (G is read as G^{ϵ}), the effect of errors on linkage will be entirely mediated via the distortion of the IBD distribution (the true IBD status π of two siblings may be incorrectly inferred as π^{ϵ}). We are therefore interested in deriving the probability distribution P(π^{ϵ}π), this is done by conditioning on both the true and observed genotypes as follows:
Let us consider the case of complete information. This can be conceptualized by means of an idealized marker whose number of alleles is infinite, in particular identity by state (IBS) status is equivalent to IBD status. The unordered genotypes of a sib pair can be partitioned into seven exclusive classes denoted ii/ii, ii/ij, ii/jj, ii/jk, ij/ij, ij/ik and ij/kl depending on the number of homozygous sibs in the pair and the number of distinct alleles in the sibpair genotype. Sharing 0 alleles IBD corresponds to a sibpair genotype of the ij/kl class, should an error occur according to the population frequency error model then one of the four alleles would be transformed into yet another type (since the number of alleles is infinite, the probability that the new allele is read as one of i, j, k or l tends to 0), therefore the sib pair genotype will remain in the ij/kl class and the observed IBD status π^{ϵ} will still be 0. For the same starting genotype, an error according to the false homozygosity model produces an ii/jk class and π^{ϵ} also equals 0 therefore P(π^{ϵ} = 0π = 0) = 1 whatever the genotyping error mechanism considered previously. The same line of reasoning leads to P(π^{ϵ} = 0.5π = 0.5) = 1  $\frac{\u03f5}{2}$, P(π^{ϵ} = 0π = 0.5) = $\frac{\u03f5}{2}$, P(π^{ϵ} = 1.0π = 1.0) = 1  ϵ, P(π^{ϵ} = 0.5π = 1.0) = ϵ. Those results can be summarized by the transition matrix below, where the (i, j) element is equal to P(π^{ϵ} = (j  1)/2π = (i  1)/2)
The overall effect of genotyping error is thus to reduce the observed IBD sharing, indeed E(π^{ϵ}π) = (1  ϵ/2)π and E(π^{ϵ}) = $\frac{1}{2}$  ϵ/4 while the variance is practically unchanged since $\mathrm{var}({\pi}^{\u03f5})=\frac{1}{8}\frac{1}{16}{\u03f5}^{2}$. In selected samples of extremely concordant sib pairs (EC) where linkage is evidenced by an excess in IBD sharing, it therefore seems logical to expect a decrease in power. Conversely, in selected samples of extremely discordant sib pairs (ED) where linkage is evidenced by a reduction in IBD sharing, the test might lead to increased type I error. In the next subsection, we formally quantify this bias in selective samples schemes for quantitative traits under the usual assumption of a normal variance components model.
Impact on linkage testing
Regressionbased linkage testing
We assume that the sib pair phenotypic data x = (x_{1}, x_{2})' have been adjusted for any relevant covariates (e.g. sex, age, country, ...) and have been standardized so that the (known) population mean, variance and sibsib correlation are 0, 1 and ρ respectively. Under the additive variance components model, x given IBD information p follows a bivariate normal distribution with zero mean and variancecovariance matrix given by
where γ ≥ 0 denotes the proportion of total variance explained by the putative locus. Under this model, an optimal testing strategy first advocated in [7] (and sometimes referred to as the optimal HasemanElston regression) is to regress (through the origin) excess IBD sharing π  $\frac{1}{2}$ on the following C function of the trait values:
This test turns out to be a score test for the linkage parameter γ [8] and is based upon the following approximate relation which is valid for small locus effects [12]:
where $\frac{1}{8}$ = var_{0}(π). In a set of sibships indexed by i, an efficient estimate of the linkage parameter γ is $\stackrel{\u02c6}{\gamma}=8\frac{{\displaystyle {\sum}_{i}({\pi}_{i}\frac{1}{2}){C}_{i}}}{{\displaystyle {\sum}_{i}{C}_{i}^{2}}}$. It is approximately unbiased E($\stackrel{\u02c6}{\gamma}$) = γ and has variance var_{0}($\stackrel{\u02c6}{\gamma}$) = 1/$\mathcal{I}$ where $\mathcal{I}=\frac{1}{8}{\displaystyle {\sum}_{i}{C}_{i}^{2}}$ is the corresponding Fisher's information. The test statistic is given by $\stackrel{\u02c6}{\gamma}\sqrt{\mathcal{I}}$, it is onesided, only positive values being regarded as evidence for linkage. For small QTL effects, power of this test can be computed as Φ (Φ^{1}(α) + γ$\mathcal{I}$^{1/2}). Fisher's information $\mathcal{I}$, which depends on sample size and study design, therefore controls power. In the design phase of a study, $\mathcal{I}$ should be used as a criterion to differentiate between alternative designs rather than sample size only [12, 13].
Impact of genotyping error on regression
By conditioning on the true IBD sharing values, we can compute P(π^{ϵ}x, γ, ϵ) = ∑_{ π }P(π^{ϵ}π) P(πx, γ), using the transition probabilities P(π^{ϵ}π) derived earlier, while the P(πx, γ)'s are given in [12]. This permits computation of the new regression line in presence of genotyping error as
As mentioned earlier, the corresponding variance under the null hypothesis is only slightly altered. The effect of genotyping error is thus to shrink the regression line by a factor 1  $\frac{\u03f5}{2}$ and to shift the intercept by $\frac{\u03f5}{4}$. If we ignore genotyping error i.e. we estimate γ using ${\stackrel{\u02c6}{\gamma}}^{\u03f5}=8\frac{{\displaystyle {\sum}_{i}({\pi}_{i}\frac{1}{2}){C}_{i}}}{{\displaystyle {\sum}_{i}{C}_{i}^{2}}}$, this results in a biased estimator $\text{bias}({\stackrel{\u02c6}{\gamma}}^{\u03f5})=E({\stackrel{\u02c6}{\gamma}}^{\u03f5})\gamma =\u03f5\left(\frac{\gamma}{2}+2A\right)$ with $A=\frac{{\displaystyle {\sum}_{i}{C}_{i}}}{{{\displaystyle \sum C}}_{i}^{2}}=\frac{\overline{C}}{\overline{{C}^{2}}}$. The resulting testing statistic ${\stackrel{\u02c6}{\gamma}}^{\u03f5}{\mathcal{I}}^{1/2}$ would then have power equal to
Note that taking γ = 0 in this formula gives the type I error rate. Since $\mathcal{I}$ increases with sample size, the impact of genotyping error on both power and type I error will be larger as the sample size increases. In terms of Y versus X regression, the intuition is that the regression through the origin is not affected by a general shift in the Yvariable (IBD sharing) if the Xvariable (C variable) has average 0, or takes values far away from 0. The further away the Xvariable C is from 0, the smaller A, hence the smaller the bias.
Bias and impact on power and type I error
Since $\text{bias}({\stackrel{\u02c6}{\gamma}}^{\u03f5})=\u03f5\left(\frac{\gamma}{2}+2A\right)$ and γ is typically small, the distortion of the usual linkage test in presence of genotyping error heavily depends on the designspecific quantity $A=\overline{C}/\overline{{C}^{2}}$. Unfortunately, there is little intuition about the distribution of C (hence about the distribution of A) in the whole population or in a selected sample. Nevertheless, Monte Carlo simulations can be used to determine the characteristics of the C and A distributions in the whole population or for a specific ascertainment scheme. In random samples and under the variance components model, C is a score function hence E(C) = 0 therefore its sample estimate $\overline{C}$ will be close to 0; one can also check that its distribution is negatively skewed (unless ρ = 0). The result is that the bias will be small for random samples. The same finding would hold for any ascertainment scheme where $\overline{C}$ = 0. An optimal selection scheme [12] that would select sib pairs based on Fisher's information $\mathcal{I}$ (i.e. such that C ≥ C_{0}) does not warrant that $\overline{C}$ = 0 because of the skewness of C. In EC designs (both siblings have trait values either larger than a positive threshold or smaller than a negative threshold), $\overline{C}$ tends to be positive while it tends to be negative in ED designs (one sibling's trait value is larger than a positive threshold while the other sibling's trait value is smaller than a negative threshold), the linkage test will therefore have reduced power in EC designs and increased type I error in ED designs.
In the lefthand side of Table 1, we have computed the values of A and $\overline{C}$ for the three selective schemes considered. The designs are indexed by the sibsib correlation ρ and the degree of selection. One obvious way to correct for the shift in the intercept induced by genotyping error would be to leave the regression unconstrained, this would correct for most of the bias. Unfortunately, in selected designs where the variance of C is reduced, this results in a very inefficient estimator of the linkage parameter γ. The righthand side of Table 1 displays the variance of the linkage parameter estimates in constrained (${\mathrm{var}}_{\text{con}}(\stackrel{\u02c6}{\gamma})=1/{\displaystyle {\sum}_{i}{C}_{i}^{2}}$) and unconstrained (${\mathrm{var}}_{\text{uncon}}(\stackrel{\u02c6}{\gamma})=1/{\displaystyle {\sum}_{i}{({C}_{i}\overline{C})}^{2}}$) regressions. Efficiency losses of unconstrained versus constrained regressions in EC and ED designs are unacceptably large even for moderately extreme selection schemes.
In Table 2, we report the power and type I error for realistic genotyping error rates [14] equal to 0.005 and 0.01 for the same designs as in Table 2. The equivalent sample size used corresponds to samples with Fisher's information equal to 2500 which provides 90% power to detect a QTL explaining 10% of the total variance in absence of genotyping error (pointwise nominal error rate = 10^{4}). The most visible impact is on type I error rates in ED design which is up to 7 times its nominal value. The $\mathcal{I}$ design that combines EC and ED sib pairs appears to be fairly immune to genotyping error while EC designs do not incur power losses greater than 20%. Finally, those computations confirm the intuition expressed earlier that the effect of genotyping error is less severe in more extreme selection schemes.
Genomic control for genotyping error
As we have seen in previous sections, the main effect of genotyping error is to modify the intercept in the regression used to test for linkage. Although an unconstrained regression would correct most of the bias due to genotyping error, the inefficiency of this strategy makes it impractical. In order to obtain an efficient and robust inference, it therefore seems natural to try and constrain the regression through its correct origin a. In this section, we propose a completely datadriven strategy for doing this.
At any position, the sample mean IBD sharing has variance 1/8n where n is the number of sib pairs available. If we knew that the position is unlinked or if the sample of sib pairs was random then the deviation of this mean from $\frac{1}{2}$ would provide an estimate of the intercept a in the linkage regression.
Unfortunately, detection of a positionspecific intercept corresponding to typical error rates would require a sample size of order 10^{4}, a number that is almost never reached in linkage studies. In order to obtain an intercept estimate $\stackrel{\u02c6}{a}$ with sufficient precision, it is therefore essential to combine information across positions. The value of IBD sharing at positions outside of the neighborhood of influencing loci (those positions are subsequently referred to as unlinked) across the genome may serve as control in the test for linkage, this concept of genomic control has been used to make the analysis of association studies more robust [15].
Let's assume that the proportions of alleles shared IBD π is computed at a series of approximately regular positions indexed by t across the whole genome. Let y_{ t }be the sample mean (among families) excess IBD at position t i.e. ${y}_{t}\equiv \overline{{\pi}_{t}^{\u03f5}}0.5$. Under the variance components model and for small QTL effect γ, equation (3) implies that
In random samples or in any sample where $\overline{C}$ ≃ 0, taking the average of y_{ t }across positions provides an estimate of a. In selected samples, we can use a trimmed version of the mean of y, for example a 20%trimmed mean of the (y_{ t })_{ t }series (i.e. the mean of the y_{ t }values after removing the 20% lowest and and 20% highest values) will provide a robust genomic estimate $\stackrel{\u02c6}{a}$ of a. Because a ≤ 0 and $\overline{C}$ is positive and negative in EC designs and ED designs respectively, $\stackrel{\u02c6}{a}$ could be refined by trimming off only the 20% highest and lowest y_{ t }values respectively before taking the mean. Of course, how much we trim is arbitrary but 20% can safely be taken as a conservative value for oligogenic traits (Indeed, a 3500 cM genome contains approximately 70 quasiindependent loci, so a 20% trimming of y_{ t }values discards 14 positions (including all active gene positions if less than 14 genes) from the sample used to estimate intercept a.). An adhoc implementation of the concept of genomic control is then to plug in the estimate of the intercept $\stackrel{\u02c6}{a}$ into the linkage regression (3). Since most of the bias in the inference is due to the intercept misspecification, the precise estimate obtained by pooling across the genome will eliminate it. The implicit assumption that we make in this genomic control approach is that the regression intercept is the same at all positions, this will be challenged in the next section.
Discussion
Under two basic error models, we were able to predict quantitatively the consequences of genotyping error on inference in linkage analysis. In the idealized situation of complete IBD information, both error models have the same impact on linkage analysis. As we have seen, the effect is due to a decrease in IBD sharing. A contrario, an error process which would increase IBD sharing would produce opposite results. The true error processes involved in practice are complicated mixtures of the models alluded to here. In our experience however, it seems that processes which lower IBD sharing are predominant. Because genotyping error tends to decrease the estimated number of alleles shared IBD, the effect on evidence for linkage is opposite in EC (reduced power) and ED (increased type I error) designs, it can be dramatic in typical designs and paradoxically less severe for more extreme ascertainment schemes. By analogy, for a dichotomous trait, this means that the effect of genotyping error is less severe in ASP designs for rare diseases than for common diseases. Remarkably, in designs combining both ED and EC pairs like the $\mathcal{I}$ (or EDAC designs), the competing effects of genotyping error tend to cancel each other out. We have considered here only three types of basic selection schemes however the approach can be straightforwardly applied to any arbitrary selection scheme. Under the widely accepted variance components model, the important quantity which determines bias, type I error and power is $A=\overline{C}/\overline{{C}^{2}}$ and it can be easily estimated by Monte Carlo simulations. Note that the bias is proportional to the error rate so that Equation (4) can easily be adapted to different error rates than those considered in Table 2.
Our study used an idealized model where IBD information is assumed to be complete. In practice, IBD is uncertain and it is inferred using marker data and multipoint algorithms as implemented in publicly available software [16, 17], the general effect is to shrink the IBD estimate $\stackrel{\u02c6}{\pi}$ towards 0.5. The linkage regression (2) is changed into $E(\stackrel{\u02c6}{\pi}\frac{1}{2}x,\gamma ,\u03f5)\simeq {\mathrm{var}\phantom{\rule{0.1em}{0ex}}}_{0}(\stackrel{\u02c6}{\pi})\gamma \phantom{\rule{0.1em}{0ex}}C(x,\rho )$ where ${\mathrm{var}}_{0}(\stackrel{\u02c6}{\pi})<\frac{1}{8}$ can be either estimated from the data or by simulations. The effect of genotyping error is again mediated via the shift of the intercept in this regression but no general formula can be obtained because it depends in a very complex manner on the whole marker map configuration. Nevertheless, we can quantify this shift under realistic scenarios and compare it to its theoretical value when IBD information is complete. We simulated two different marker maps in 1 million sib pairs without parents and quantified by how much IBD sharing was reduced on average under the population frequency error model (error rate = 0.01). The microsatellites map (MS) had 13 equifrequent tenallele markers (heterozygozity = 90%) located 10 cM apart (spanning the 0–120 cM chromosomal region) and the SNP map had 41 equifrequent SNPs (heterozygozity = 50%) spanning the 50–70 cM chromosomal region (this smaller region was chosen to keep simulation time acceptable). The resulting average reduction in IBD sharing for an error rate of 0.01 was measured every 2 cM in the 50–70 cM region, it ranged from 0.4974 to 0.4976 in the MS map and from 0.4945 to 0.4955 in the SNP map. For these two maps which mimic the two most widespread genotyping paradigms nowadays, those simulations confirm results derived under the complete marker information assumption with a reduction in IBD sharing from 0.5 to 0.5 – 0.01/4 = 0.4975. Our results therefore appear to be applicable to reallife situations where IBD information is incomplete.
The genomiccontrol strategy that we have proposed, although triggered by the specific issue of genotyping error, potentially offers a general robust method for carrying out linkage analysis. It is nonetheless important to recognize its limitations. Firstly, if the trait is highly polygenic with contributing genes scattered across the genome, the high correlation between linkage positions will make it impossible to estimate the IBD sharing at null positions. The genomic control strategy should therefore only be considered with oligogenic traits. Secondly, the concept of genomic control relies on the assumption that the genotyping error rates are similar across markers. For markers with a similar degree of polymorphism (number of alleles and frequencies), this assumption might be acceptable. In a multipoint setting, an additional assumption required to ensure the validity of a genomic control strategy is that intermarker distances be approximately equal. With microsatellite markers, both these assumptions might fail resulting in differences in the IBD sharing reduction across markers. The 'regressionbased linkage testing' view allows one to qualitatively assess how deviation from these assumptions will impact linkage testing. For example, in ASP or EC designs, wrongly assuming that IBD is uniformly reduced across markers will result in inflated type I error at marker positions with low genotyping error rate compared to other markers. The advent of SNP chips in linkage mapping holds the promise of regular marker maps with less variable information content than in classical microsatellites maps [2, 3]. The many SNPs used are likely to be subject to similar genotyping error processes, this makes the critical assumption of the genomic control strategy all the more plausible. Alternatives to this genomiccontrol strategy are possible and they also consist in constraining the linkage regression through a new origin as in the adhoc method, the estimation procedure can be adapted to suit particular circumstances. Firstly, in random samples, the assumption regarding exchangeability of positions might be relaxed. Indeed, the reduction in IBD sharing at each position may be used as estimates of the positionspecific intercepts (a study sufficiently powered to detect linkage in random samples should have a huge sample size which would ensure sufficient precision of the positionspecific intercepts). However, it must be stressed that the advantage of using a genomic control in random samples is limited because the impact of genotyping error is small in such designs. Secondly, one could use previous lab data to estimate by how much IBD sharing deviates from its expected value, this could also be done at each position separately provided sufficient data are available. In practice, such data might not be available or they might not trustfully reflect current error mechanisms.
Elston et al. [18] have pointed out that the implicit assumption made in ASP designs, that randomly sampled sib pairs share half of their alleles IBD, might not hold in practice and have argued for including discordant pairs in such studies. The genomic control approach suggested here may be an alternative solution to this issue. Finally we note that, although we have only considered designs involving sib pairs, the approach naturally extends to other types of relative pairs.
Conclusion
Under realistic genotyping error scenarios, power losses observed in extremely concordant designs are modest but the effect on type I error in extremely discordant designs can be dramatic. Our analytic approach provides some understanding of the differences in influence of genotyping errors across study designs. The advent of SNP arrays does not eliminate the impact of genotyping errors but it makes genomic control a feasible option with the potential to deliver more robust inference in linkage analysis data subject to genotyping errors or other mechanisms distorting the IBD signal.
Abbreviations
 ASP:

affected sib pair
 EC:

extremely concordant
 ED:

extremely discordant
 EDAC:

extremely concordant and extremely discordant
 IBD:

identicalbydescent
 QTL:

quantitative trait locus
 SNP:

singlenucleotidepolymorphism.
References
 1.
Amos CI, Chen WV, Lee A, Li W, Kern M, Lundsten R, Batliwalla F, Wener M, Remmers E, Kastner DA, Criswell LA, Seldin MF, Gregersen PK: Highdensity SNP analysis of 642 Caucasian families with rheumatoid arthritis identifies two new linkage regions on 11p12 and 2q33. Genes Immun. 2006, 7: 277286. 10.1038/sj.gene.6364295.
 2.
Evans DM, Cardon LR: Guidelines for genotyping in genomewide linkage studies: Singlenucleotidepolymorphism maps versus microsatellite maps. Am J Hum Genet. 2004, 75: 687692. 10.1086/424696.
 3.
Schaid DJ, Guenther J, Christensen G, Hebbring S, Rosenow C, Hilker C, McDonnell S, Cunningham J, Slager S, Blute M, Thibodeau SN: Comparison of microsatellites versus singlenucleotide polymorphisms in a genome linkage screen for prostate cancersusceptibility loci. Am J Hum Genet. 2004, 75: 948965. 10.1086/425870.
 4.
Douglas JA, Boehnke M, Lange K: A multipoint method for detecting genotyping errors and mutations in siblingpair linkage data. Am J Hum Genet. 2000, 66: 12871297. 10.1086/302861.
 5.
Abecasis GR, Cherny SS, Cardon LR: The impact of genotyping error on familybased analysis of quantitative traits. Eur J Hum Genet. 2001, 9: 130134. 10.1038/sj.ejhg.5200594.
 6.
Walters K: The effect of genotyping error in sibpair genomewide linkage scans depends crucially upon the method of analysis. J Hum Genet. 2005, 50: 329337. 10.1007/s1003800502691.
 7.
Sham PC, Purcell S: Equivalence between HasemanElston and VarianceComponents linkage analyses for sibpairs. Am J Hum Genet. 2001, 68: 15271532. 10.1086/320593.
 8.
Tang HK, Siegmund D: Mapping quantitative trai loci in oligogenic models. Biostatistics. 2001, 2: 147162. 10.1093/biostatistics/2.2.147.
 9.
Skatkiewicz JP, Cuenco KT, Feingold E: Recent advances in human QuantitativeTraitLocus mapping: comparison of methods for discordant sibling pairs. Am J Hum Genet. 2003, 73: 874885. 10.1086/378590.
 10.
Cuenco KT, Skatkiewicz JP, Feingold E: Recent advances in human QuantitativeTraitLocus mapping: comparison of methods for selected sibling pairs. Am J Hum Genet. 2003, 73: 863873. 10.1086/378589.
 11.
Sobel E, Papp J, Lange K: Detection and integration of genotyping errors in statistical genetics. Am J Hum Genet. 2002, 70: 496508. 10.1086/338920.
 12.
Putter H, Lebrec J, van Houwelingen JC: Selection Strategies for Linkage Studies Using Twins. Twin Res. 2003, 6: 377382. 10.1375/136905203770326376.
 13.
Lebrec J, Putter H, van Houwelingen JC: Score test for detecting linkage to complex traits in selected samples. Genet Epidemiol. 2004, 27: 97108. 10.1002/gepi.20012.
 14.
Ewen K, Bahlo M, Treloar S, Levinson D, Mowry B, Barlow J, Foote S: Identification and analysis of error types in highthroughput genotyping. Am J Hum Genet. 2000, 67: 727736. 10.1086/303048.
 15.
Devlin B, Roeder K: Genomic control for association studies. Biometrics. 1999, 55: 9971004. 10.1111/j.0006341X.1999.00997.x.
 16.
Kruglyak L, Daly MJ, ReeveDaly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996, 58: 13471363.
 17.
Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin – rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97101. 10.1038/ng786.
 18.
Elston RC, Song D, Iyengar SK: Mathematical Assumptions versus Biological Reality: Myths in Affected Sib Pair Linkage Analysis. Am J Hum Genet. 2005, 76: 152156. 10.1086/426872.
Acknowledgements
This paper originates from the GENOMEUTWIN project which is supported by the European Union Contract No. QLG2CT200201254. We are grateful to Dr. Bas Heijmans from the section Molecular Epidemiology, Dept. of Medical Statistics and Bioinformatics, Leiden University Medical Center for discussions on genotyping error mechanisms.
Author information
Additional information
Authors' contributions
JJPL participated in the method development, carried out the simulations summarized in Table 1, drafted and finalized the manuscript. HP participated in method development and in drafting the manuscript. JJHD and HCvH both participated in method development. All authors read and approved the final manuscript.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
About this article
Cite this article
Lebrec, J.J., Putter, H., HouwingDuistermaat, J.J. et al. Influence of genotyping error in linkage mapping for complex traits – an analytic study. BMC Genet 9, 57 (2008) doi:10.1186/14712156957
Received
Accepted
Published
DOI
Keywords
 Genotyping Error
 Genomic Control
 Variance Component Model
 Genotyping Error Rate
 Ascertainment Scheme