Volume 6 Supplement 1
Genetic Analysis Workshop 14: Microsatellite and singlenucleotide polymorphism
Locally weighted transmission/disequilibrium test for genetic association analysis
 Li Hsu^{1}Email author,
 Xuesong Yu^{2},
 Jeanine J HouwingDuistermaat^{3},
 HaeWon Uh^{3},
 Rachid El Galta^{3},
 Jeremie JP Lebrec^{3} and
 Hua Tang^{1}
DOI: 10.1186/147121566S1S60
© Hsu et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Abstract
The transmission/disequilibrium test statistic has been used for assessing genetic association in affectedparent trios. In the presence of multiple tightly linked marker loci where local dependency may exist, haplotypes are reconstructed statistically to estimate the joint effects of these markers. In this manuscript, we propose an alternative to the haplotype approach by taking a weighted average of multiple loci, where the weight is proportional to the product of (12X recombination fraction) and the linkage disequilibrium between markers. As an illustration, we applied the method to the simulated Aipotu data.
Background
Highdimensional singlenucleotide polymorphism (SNP) data have become increasingly available due to the advancement of high throughput genotyping technologies. These data enable researchers unprecedented capabilities for localizing regions that may be associated with the disease. An oftenused strategy for searching for diseasecausing genes is to first perform linkage analyses using genomewide microsatellite or SNP markers to identify a rough candidate region that may harbor the latent disease susceptible gene. In the second stage, dense SNP markers in this candidate region are genotyped so that the location of the disease gene can be further refined. The advantages of this mapping strategy are that it is costeffective and avoids an untargeted fishing expedition.
In this paper, we focus on the second stage, where a large number of dense markers are genotyped on the study participants. Note that the markers at this stage have already been shown to be closely linked to the disease loci, in other words, linkage analysis has reached its resolution in locating the genes. One may need to rely on the linkage disequilibrium (LD), which measures the allelic association, for further refinement. The LD between a marker locus and the disease locus is thought to decay at a rate of (1θ)^{ N }, where N is the number of generations since the introduction of the diseasecausing mutation and θ is the genetic distance between the two loci. The transmission/disequilibrium test (TDT) [1] that aims to assess the linkage and LD between a marker locus and disease loci has become popular. The TDT has since been extended to multiple tightly linked markers [2] by constructing haplotypes statistically to account for local dependency in the presence of phase ambiguity.
As an alternative to haplotypebased approaches, we proposed an approach that weights the contribution of multiple SNPs according to their association with the locus of interest. This approach does not require determination of haplotypes. The idea is similar to kernel smoothing in nonparametric regression methods [3], where the kernel function is like a sliding window and markers that fall in the window all contribute to the test statistic but with differential weights. The weight here is determined by the distance and correlation of the markers to the locus of interest.
Kofendrerd Personality Disorder (KPD) is a psychiatric syndrome characterized by an overwhelming concern with the meaning of the patient's inner emotions and world view and at the same time subsuming the emotions of others into the self. Nosology for KPD falls into three different groups: 1) "communally shared emotions" symptoms such as joining/founding cults and fear or discomfort with strangers; 2) behaviorrelated symptoms such as fascination with automobiles and aversion to walking; 3) anxietyrelated symptoms such as morbid anger/fear/terror concerning rain/snow and reluctance to wear clothing appropriate for subjective temperature. All three or combination thereof have been used for diagnosis of KPD. The condition is thought to be genetic in origin, possibly exacerbated by prevailing social conditions.
In this paper we analyzed the data collected from the Aipotu country, a populous semitropical, semidesert country with a high prevalence of KPD. The cases were classified as anyone with "notable clusters" of symptoms from any of the three groups as KPD. The families in this dataset were ascertained when at least two siblings could be classified under any of the diagnostic groups or any combination.
Methods
Consider K caseparent trios in which each individual is genotyped with the same M autosomal markers at {t_{1}, ..., t_{ M }}. Using Liang et al.'s [4] notation, denote Φ the disease status of the k^{th} offspring for k = 1, ..., K. Let H(t) and h(t) be the two alleles at marker locus t. For simplicity, we use h to denote the rare alleles among the affected offspring. This, however, is neither necessary nor consequential. For the k^{th} trio, the transmission status Y_{ k }(t) for paternal alleles at locus t can be described as:
Similarly, one can define the maternal transmission status X_{ k }(t). Assuming that there is only one disease locus at t_{0} in the region framed by these M markers, the expectation of the transmission status [4] is
where d(t, t_{0}) = Pr{H(t)H(t_{0})}  Pr{H(t)h(t_{0})}, a measure for linkage disequilibrium and θ is the recombination fraction. We further assume that there is no imprinting in this dataset, that is, E {X(t)} = E{Y(t)}. Denote C = E{Y(t_{0})Φ = 1}. One can see that the value of C is determined by the penetrance function and the allele frequencies of disease locus t_{0} [4]. Under the assumptions of initial complete LD, random mating, and constant Pr{H(t_{0})} over time, d(t, t_{0}) can be expressed as [5]. Here N is the number of generations since the introduction of a diseasecausing mutation at location t_{0}. The parameters of interest in the mean function μ(t, t_{0}) are C for penetrance, N for the number of generations, and t_{0} the location of disease locus. Because Y(t) and X(t) are potentially correlated over M markers, Liang et al. [4] proposed a generalized estimating equation approach to estimate these parameters. An appealing feature of this approach is that the derived parameter estimates remain valid as long as μ(t, t_{0}) is correctly specified. Liang et al. [4] also proposed to test the null hypothesis of no linkage or LD to the region framed by the observed M markers by testing C = 0. The test statistic is based on a Waldtype statistic, that is, , requiring a simultaneous estimation of (t_{0}, N, C) under the assumption that there is a disease locus in the region. However, this approach has several limitations: 1) t_{0} is unidentifiable under the null hypothesis; 2) there is a lack of robustness ifthe assumption of constant Pr{H(t_{0})} over time is not met; and 3) in testing C = 0, one would still need to estimate all parameters.
With this consideration we propose to derive a score test statistic for testing C = 0 at locus t_{0}, that is, t_{0} is not a disease locus. Based on Equation 10 in Liang et al. [4], a test statistic can be derived as
where
A nice feature of kappa is that the proportion of agreements is calculated after excluding chance agreement. The value of kappa statistic ranges from 1 (negative complete linkage disequilibrium) to 1 (positive complete linkage disequilibrium). Clearly, each term in the sum of Equation (1) remains unchanged if the allele designation, H versus h, is switched.
It is easy to generalize the test statistic T_{2} in a couple of ways. For example, rather than summing over the total M markers in the test statistic, one can also use the markers within a prespecified neighborhood of t_{0}. In addition, the test statistic T_{2} can be extended to accommodate multiple affected siblings. The following statistic describes these extensions:
where n_{ k }is the number of affected in the k^{th} family and B is a prespecified neighborhood around marker locus t_{0}. We name test statistic T as the locally weighted TDT. The choice of the size of a neighborhood depends on many factors such as the nature of the disease mutation and population under study and the marker density. An examination of intermarker linkage disequilibrium may help determine the window size. By the central limit theorem, K^{1/2}T is asymptotically normal with a variance that can be empirically estimated by . To account for the multiple comparisons in the tests, one may combine test statistics of all the markers by taking the maximum and determine its critical values by a simulationbased procedure in that the transmission status for each affected offspring are randomly assigned for a large number of times.
Results
The data that we analyzed in this paper consisted of all affected offspring and their parents from the first replicate of Aipotu study. There were a total of 100 nuclear families with 283 affected offspring. We had no knowledge of the "answers" at the time when we performed the following analyses.
We performed a singlepoint linkage analysis using the microsatellite markers genotyped on the affected sibpairs (see the companion paper by HouwingDuistermaat et al. [6]). The microsatellite markers were on average about 7.5 cM apart. We found that the LOD scores for marker D3S0124 and D3S0127 on chromosome 3 were 4.51 and 3.06, respectively. Both exceeded the cutoff threshold of LOD score 3 for IBD testing. Marker D3S0124 was even beyond 3.6, a critical value suggested by Lander and Kruglyak [7] for genomewide significance. Based on these results, we subsequently purchased 7 packets of basic SNP markers in this region flanked by microsatellite markers D3S0123 and D3S0127. This covers all available SNP markers for the telomere end of chromosome 3. Excluding the microsatellite markers, there were a total of 134 SNP markers covering about 35 cM in genetic distance.
To study whether SNP B03T3056 and B03T3057 partly explain the linkage peaks at microsatellite markers D3S0124 and D3S0127, we then included SNP B03T3056 and B03T3057 each and both as covariate(s) in the singlepoint linkage analysis using the same affected sib pairs as in the initial linkage analysis scan (see Table 1 from HouwingDuistermaat et al. [6]). The overall LOD score for microsatellite marker D3S0127 and SNP B03T3056 was increased compared to the LOD score for the microsatellite marker only (p = 0.02). But the increase in overall LOD score was fairly minimal when SNP B03T3057 was considered. For microsatellite marker D3S0124, only a moderate improvement was observed in the overall LOD scores after including the SNPs. Based on these results, we postulate that SNP B03T3056 only partially explains the linkage signal at microsatellite markers D3S0124 and D3S0127 and other unknown genes may still be present in the region.
Conclusion
In this paper we proposed a method that accounts for the local dependencies among adjacent markers. We applied it to the simulated dataset and showed that the proposed test statistics yield a smoothed signal between marker B03T3056 and B03T3057. The proposed method did not show much more power than the conventional TDT, in part due to an overall weak intermarker LD in this SNP dataset (results are not shown). Further work on the performance of the proposed method under a wide range of scenarios will be warranted. The choice of window size in the locally weighted test statistic depends on the nature of the disease mutation and population under study as well as marker density. One possible choice is to first examine an overall LD in the region and use it as guidance for determining the window size. A strong LD suggests a wide window size and vice versa. Another possible choice is to calculate the locally weighted test statistics for a few different window sizes and combine them into one test statistic by taking the maximum. The appropriate critical threshold value needs to be adjusted for such a combinatorial test statistic. In this manuscript, we are testing the null hypothesis C = 0. An alternative may be to construct confidence bands for , turning the testing problem into an estimation one. The region for which the confidence bands do not include 0 is likely an indication for a disease locus. An advantage of such a approach is that it provides a confidence interval for which the disease locus might reside. We will investigate methods for constructing confidence bands in the future.
Abbreviations
 SNP:

Singlenucleotide polymorphism
 TDT:

Transmission/disequilibrium test
 LD:

Linkage disequilibrium
 KPD:

Kofendrerd Personality Disorder
Declarations
Acknowledgements
This work was done when Li Hsu was on sabbatical at the Department of Medical Statistics and Bioinformatics in the Leiden University, The Netherlands.
Authors’ Affiliations
References
 Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM). Am J Hum Genet. 1993, 52: 506516.PubMed CentralPubMedGoogle Scholar
 Zhao H, Zhang S, Merikangas KR, Trixler M, Wildenauer DB, Sun F, Kidd K: Transmission/disequilibrium tests using multiple tightly linked markers. Am J Hum Genet. 2000, 67: 936946. 10.1086/303073.PubMed CentralView ArticlePubMedGoogle Scholar
 Wand MP, Jones MC: Kernel Smoothing. 1995, Chapman & Hall/CRCView ArticleGoogle Scholar
 Liang KY, Hsu FC, Beaty TH, Barnes KC: Multipoint linkage disequilibrium mapping approach based on the caseparent trio design. Am J Hum Genet. 2001, 68: 937950. 10.1086/319504.PubMed CentralView ArticlePubMedGoogle Scholar
 Devlin B, Risch N: A comparison of linkage disequilibrium measures for finescale mapping. Genomics. 1995, 29: 311322. 10.1006/geno.1995.9003.View ArticlePubMedGoogle Scholar
 HouwingDuistermaat JJ, Uh HW, Lebrec JJP, Putter H, Hsu L: Modeling the effect of an associated singlenucleotide polymorphism in linkage studies. BMC Genet. 2005, 6 (Suppl 1): S4610.1186/147121566S1S46.PubMed CentralView ArticlePubMedGoogle Scholar
 Lander ES, Kruglyak L: Genetic dissection of complex traits: guidelines for interpreting and reporting linkage results. Nature Genet. 1995, 11: 241247. 10.1038/ng1195241.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.