Volume 6 Supplement 1
Genetic Analysis Workshop 14: Microsatellite and singlenucleotide polymorphism
Which strategy is better for linkage analysis: singlenucleotide polymorphisms or microsatellites? Evaluation by identitybystate – identitybydescent transformation affected sibpair method on GAW14 data
 Qingqi Yue^{1, 2}Email author,
 Victor Apprey^{1, 2} and
 George E Bonney^{1, 2, 3}
DOI: 10.1186/147121566S1S16
© Yue et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Abstract
The central issue for Genetic Analysis Workshop 14 (GAW14) is the question, which is the better strategy for linkage analysis, the use of singlenucleotide polymorphisms (SNPs) or microsatellite markers? To answer this question we analyzed the simulated data using Duffy's SIBPAIR program, which can incorporate parental genotypes, and our identitybystate – identitybydescent (IBSIBD) transformation method of affected sibpair linkage analysis which uses the matrix transformation between IBS and IBD. The advantages of our method are as follows: the assumption of HardyWeinberg equilibrium is not necessary; the parental genotype information maybe all unknown; both IBS and its related IBD transformation can be used in the linkage analysis; the determinant of the IBSIBD transformation matrix provides a quantitative measure of the quality of the marker in linkage analysis. With the originally distributed simulated data, we found that 1) for microsatellite markers there are virtually no differences in types I and II error rates when parental genotypes were or were not used; 2) on average, a microsatellite marker has more power than a SNP marker does in linkage detection; 3) if parental genotype information is used, SNP markers show lower type I error rates than microsatellite markers; and 4) if parental genotypes are not available, SNP markers show considerable variation in type I error rates for different methods.
Background
A key issue in nonparametric linkage analysis is the accuracy in the estimation of the relative pair identitybydescent (IBD) distributions. The Genetic Analysis Workshop 14 (GAW14) simulated data provide an opportunity to evaluate new or existing methods for linkage analysis since the "answers" were known to the designers of the simulated data. We applied two types of methods to find the locations of linkage and determine the power and type I errors for singlenucleotide polymorphism (SNP) and microsatellite markers according to whether or not parental genotypes are available. The first method is the affected pedigree member (APM) method implemented in Duffy's SIBPAIR program [1], which uses all the pedigree information including the parental genotypes and parentalsibling relationships and based on Weeks and Lange's method [2]. The second method is our recently developed IBSIBD (identitybystate – identitybydescent) transformation method, which generalizes Lange's affected sibpair method [3] and uses the affected sibpair genotypes only. In this paper, we compared their power and type I error rates under the two different data assumptions when parental genotypes are available and when they are not available.
Methods
We applied two types of methods to determine the performance (power and type I errors) for SNP and microsatellite markers with different data assumptions based on the availability of parental genotypes. The first method is the APM method implemented in Duffy's SIBPAIR program, which was fully documented in [1]. The second method is our recently developed IBSIBD transformation method, which generalizes Lange's affected sibpair method [2] and currently uses only the affected sibpairs. The method is based on the following proposition:
Proposition
Assume that 1) parental mating is random; 2) in the parental population, for any genotype the two possible phase known genotypes have the same probability; 3) for each mating type that produces a sibpair with IBD = 0, the two possible sib pairs have an equal probability to come; if the IBD = 1, the shared IBD allele has an equal probability to come from each one of the two parents. Let P_{ ij }(M) = P_{ ji }(M) = 1/2 of the sum of frequencies for the genotypes a_{ i }/a_{ j }and a_{ j }/a_{ i }in the parental generation with a_{ i }(i = 1, 2, ...,n) being the alleles over the marker. Then in a full sib pair population without gender differences, the IBS and IBD probabilities are related by
where the transformation matrix T = [T_{ ij }] with T_{ ij }= p(IBS = iIBD = j) 0 ≤ j ≤ i ≤ 2 is given by
T_{11} = Het(M)
T_{21} = Hom(M)
Hom(M) and Hom(M^{2}) are the sums of all diagonal elements for the matrix [P_{ ij }(M)] and [P_{ ij }(M)]^{2}, respectively, Het(M) and Het(M^{ 2 }) are the sums of all offdiagonal elements for the matrix [P_{ ij }(M)] and [P_{ ij }(M)]^{2}, respectively, and P_{ i }(M) is the frequency for the i^{th} allele a_{ i }, (i,j = 1, 2, ..., n). The above formula reduces into Lange's [2] formula (with different form) for expected IBS distribution under the null hypothesis of no linkage and HardyWeinberg equilibrium assumption P_{ ij }(M) = P_{ i }(M)P_{ j }(M). Our formula can transform the IBD distribution to that of IBS by the transformation matrix T or viceversa through the inverse transformation matrix T^{1}. With the estimates for IBD or the IBS probabilities, the statistics for nonparametric linkage analysis can be calculated and tested in the usual manner.
We performed all analyses without knowledge of the "answers." We still do not have the "answers," except those results appearing in the meeting abstracts.
Results
The medians of type I error rates over all the markers
IBS  IBSIBD  Duffy's IBD  

SNP  
AI  5%  AI  15%  AI  0% 
DA  5%  DA  16%  DA  0% 
KA  5%  KA  16%  KA  0% 
NY  4%  NY  21%  NY  0% 
MS  
AI  6%  AI  3%  AI  3% 
DA  6%  DA  3%  DA  3% 
KA  6%  KA  3%  KA  3% 
NY  5%  NY  6%  NY  3% 
Based on our singlepoint linkage analysis, we observed the following results with respect to the comparison of SNP vs. Microsatellite markers (one SNP marker vs. one Microsatellite marker around the same location) and the effects of the parental genotype information in the comparison on the two types of markers.
1) On average, a microsatellite marker showed higher rates of significant replications than a SNP marker over the linkage locations.
3) For the microsatellites, our affected sibpair only methods showed a modest "noise" level (3%–6%) while the affected pedigree method (Duffy's APM) has a stable "noise" level of 3%. Both methods have almost the same "power." Since our method just used one affected sibpair (no parental information), it seems therefore that parental genotyping may not be very critical in linkage analysis for microssatellite markers.
4) For SNP data, our affected sibpair only IBS method has a "noise" level (4%–6%), IBSIBD method has a high "noise" level (15%–21%) while affected pedigree method (Duffy's APM) has a stable "noise" level of 0%. The high "noise" level for IBSIBD method reflects the fact that the IBSIBD matrix for a SNP marker is close to singular. Thus, we conclude that for SNP data with parental genotypes, the false positive rate is very low in linkage analysis, and without parental genotype information the false positive rate can be relatively high.
Discussion
The different sites vary with respect to the power to detect linkage. Since the linkage evidence over a marker for a disease is inversely proportional to the number of markers which interact in determining the phenotype, our results may reflect some characteristics of the four population groups. For example, the relatively weak linkage over the four locations in the Aipotu group (see Figure 1) is consistent with the fact that the affected Kofendrerd Personality Disorder (KPD) is defined by one or more of the three clinical categories: communally shared emotions, behavioral related and anxiety related, and the strong linkage over the first two locations (around D01S0023 and D03S0127) in the Danacca group (see Figure 1) may reflect the fact that only the behavioral symptoms were classified as affected in Danacca data. Similar results may reflect the characteristics in the other two populations.
Conclusion
In summary, we conclude 1) for microsatellite markers there are virtually no differences in type I or type II error rates whether one uses or excludes parental genotypes. 2) On average, a microsatellite marker provides more power than a SNP marker does in linkage analysis. 3) If parental genotype information is used, SNPs show lower type I error rates than that of microsatellite markers. 4) If parental genotypes are not available, SNPs show variable type I error rates over different methods.
In summary, other things being equal in the simulated sample analyzed, microsatellites are better than SNPs, although if parents are typed, SNPs can have slightly better type I error rates.
Abbreviations
 APM:

Affected pedigree member
 GAW:

Genetic Analysis Workshop
 IBD:

Identitybydescent
 IBS:

Identitybystate
 KPD:

Kofendrerd Personality Disorder
 SNP:

Singlenucleotide polymorphism
Declarations
Acknowledgements
The research reported here has been supported in part by Public Health Research Grants from the National Institutes of Health's Aging Institute, grant number AG16996, and the National Center for Research Resources grant number 2G12RR003048.
Authors’ Affiliations
References
 Duffy D: SIBPAIR. [http://www.qimr.edu.au/davidD/davidd.html]
 Weeks DE, Lange K: The affectedpedigreemember method of linkage analysis. Am J Hum Genet. 1988, 42: 315326.PubMed CentralPubMedGoogle Scholar
 Lange K: The affected sibpair method using identity by state relations. Am J Hum Genet. 1986, 39: 148150.PubMed CentralPubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.