 Proceedings
 Open Access
 Published:
Finemapping using the weighted average method for a casecontrol study
BMC Genetics volume 6, Article number: S67 (2005)
Abstract
We present a new method for finemapping a disease susceptibility locus using a casecontrol design. The new method, termed the weighted average (WA) statistic, averages the CochranArmitage (CA) trend test statistic and the difference between the HardyWeinberg disequilibrium test statistic for cases and controls (the HWD trend). The main characteristics of the WA statistic are that it improves on the weaknesses, and maintains the strengths, of both the CA trend test and the HWD trend test. Data from three different populations in the Genetic Analysis Workshop 14 (GAW14) simulated dataset (Aipotu, Karangar, and Danacaa) were first subjected to modelfree linkage analysis to find regions exhibiting linkage. Then, for finescale mapping, 140 SNPs within the significant linkage regions were analyzed with the WA test statistic on replicates of the three populations, both separately and combined. The regions that were significant in the multipoint linkage analysis were also significant in this finescale mapping. The most significant regions that were obtained using the WA statistic were regions in chromosome 3 (B03T3056–B03T3058, pvalue < 1 × 10^{10} ) and chromosome 9 (B09T8332–B09T8334, pvalue 1 × 10^{6} ). Based on the results of the simulated GAW14 data, the WA test statistic showed good performance and could narrow down the region containing the susceptibility locus. However, the strength of the signal depends on both the strength of the linkage disequilibrium and the heterozygosity of the linked marker.
Background
It has been shown that finescale mapping of a susceptibility locus for a complex disease can be accomplished by evaluating the deviation from HardyWeinberg equilibrium (HWE). For example, Feder et al. [1], Nielsen et al. [2] and Jiang et al. [3] have discussed using the HardyWeinberg disequilibrium (HWD) test on affected individuals alone. From their results, this HWD test tends to perform well for a recessive disease model and could be more precise in gene localization, but has no power at all for a multiplicative disease model. For casecontrol studies, Sasieni et al. [4] showed that the CochranArmitage (CA) trend test, which uses genotype data, is more appropriate than the allelebased test when HWE is violated. The CA trend test is good when there is dominant inheritance, has power where the HWD test has no power, but requires allowance for population structure. Devlin and Roeder [5] proposed genomic control to allow for population heterogeneity when using the CA trend test. Song and Elston [6] proposed the HWD trend test, which compares the squared difference in HWD between cases and controls, divided by its estimated variance, with the chisquare distribution with 1 d.f. They showed by simulation that the HWD trend test statistic is not inflated by population stratification.
Song and Elston [6] developed a weighted average (WA) statistic that mitigates against the weaknesses and maintains the strong points of both the CA trend test and the HWD trend test. In this study, we apply modelfree linkage analysis to find regions of interest, and then the WA statistic method to fine map, the Genetic Analysis Workshop (GAW14) simulated dataset in order to find susceptibility disease genes. Finally, we compare the results of the WA test with those of the CA trend and the HWD trend tests and find that for these data the WA is virtually identical to the CA test.
Methods
For a casecontrol study with a diallelic marker locus, write the CA trend test statistic as
and the HWD trend test statistic as . Then the WA statistic is given by
where
Song and Elston [6] used simulation to show that asymptotically the random variable Y is well approximated by a Gamma distribution, F(y; θ, κ), with mean μ = 1.78 and variance σ^{2} = 3.45, i.e., θ = σ^{2} /μ = 1.94 and κ = μ^{2} /σ^{2} = 0.92. The best value to take for w was problematic because it depends on details of the alternate hypothesis, which are usually unknown. Thus, they chose the value of w indicated above after performing several exploratory simulation studies. They modeled the empirical α (pvalue) that would predict the probability of type I error from the α corresponding to 1  F(y_{(1α)}; θ, κ) using the regression equation
where α_{ i }(i = 1, 2, ..., n) was estimated from a series of n simulation experiments. Based on extensive simulation results for various sample sizes > 50, they estimated a to be
1.4922[1/log_{ e }R × log_{ e }S] + 0.1208[log_{ e }(R/S)]  1.6929[1/(log_{ e }R × log_{ e }S × (0.5  M  0.5))] + 0.9250(M  0.5)^{2}
when there are R cases and S controls and different values of the marker allele frequency M. To adjust the test statistic by a factor that measures the amount of variance inflation caused by population stratification and cryptic relatedness, the variance of
can be adjusted using the genomic control method of Devlin and Roeder [5] when the inflation factor is >1.
Genome scan
We analyzed the binary trait affected/unaffected in replicates from the three different populations in the dataset (Aipotu, Karangar, and Danacaa). For each of the three different populations, we first used the microsatellite markers that were on average 7.5 cM apart to perform a linkage scan. We used SIBPAL (S.A.G.E., version 4.6), a modelfree linkage program, to perform a multipoint linkage analysis. Evidence for linkage was evaluated by HasemanElston regression [7] using a weighted average of the squared trait difference and the squared meancorrected trait sum (option W4 [8]). The empirical pvalues for each marker location were calculated and pooled over the populations.
Fine mapping
After reviewing the firststage genomescan linkage results, we selected for a casecontrol study 140 SNPs 0.3 cM apart that encompassed the significant linkage regions. From each of three replicates (one from each population), we randomly sampled 100 affected probands as our cases and 100 unaffected probands as our controls. In an attempt to induce population stratification, we also pooled the replicates from the three populations to obtain 300 cases and 300 controls. Then the WA test statistic was calculated for each of the four different samples using the selected SNPs. We repeated the sampling from each population 5 times and recorded the percentage of times that the hypothesis of no disequilibrium was rejected. The potential confounding effect of population stratification could be allowed for by the genomic control method. Nineteen SNPs that were independent (>12 cM apart if on the same chromosome) and unlinked to the disease locus (p ≥ 0.8), spread throughout the whole genome, were selected for this purpose. However, for the 300 cases and 300 controls, the estimated variance inflation factor was close to 1.00 for each of the 5 replicate samples, suggesting no significant stratification exists in this admixed population. Therefore, no adjustment for variance inflation was made for these data.
Results
Genome scan using modelfree linkage analysis
In our analysis, we observed 8 microsatellite markers with a significance level of p ≤ 0.005 Evidence of linkage was found on chromosome 1 (22cM region that included D01S0023 (p = 0.00028) and D01S0024 (p = 0.0016)), chromosome 3 (12cM region that included D03S0126 (p = 0.0012) and D03S127 (p = 0.00022)), chromosome 5 (4cM region that included D05S0172 (p = 0.0025)), and chromosome 9 (18cM region that included D09S0348 (p = 0.000041) and D09S0349 (p = 0.0010)).
Fine mapping using the WA statistic method
Genotypes for none of the 140 SNPs departed significantly from HWE at the 5% level, in either the cases or the controls. Nevertheless, we incorporated the HWD trend test to obtain a more powerful test of association. The results are shown in Figure 1, where no adjustment was made for multiple testing.
A sample size as small as 100 cases and 100 controls showed on average strong association (p < 0.001) with a susceptibility disease gene in the region between markers B03T3056 and B03T3058 on chromosome 3 and weaker association at marker B09T8334 on chromosome 9 (p < 0.05). Using the sample size of 300 cases and 300 controls, the association analysis confirmed the linkage signals in all regions on chromosomes 1, 3, 5, and 9. In this analysis we found association signals between markers B01T0553 and B01T0555 on chromosome 1 (p = 0.00034) (Figure 1, panel 1), markers B03T3056 and B03T3058 on chromosome 3 (p < 1 × 10^{10} ) (Figure 1, panel 2), markers B05T4146 and B05T4148 on chromosome 5 (p = 0.00250) (Figure 1, panel 3), and markers B09T8332 and B09T8334 on chromosome 9 (p < 1 × 10^{6} ) (Figure 1, panel 4).
Discussion
In this paper we have illustrated the use of a new method for finemapping using a casecontrol design. If the mode of inheritance of a candidate gene is known, then with that knowledge a more powerful method can always be derived. The WA test was derived for a situation in which the mode of inheritance is not known. We compared the performance of the CA, WA, and HWD tests in this dataset and found that the HWD trend test always had low power. Thus, the WA maintained the advantage of the CA trend test and overcame the weakness of the HWD trend test, while the CA and WA tests had almost equal power for these particular data. It should be noted that factors such as missing data and SNP density will affect the WA test to the same extent that they will affect its componentsthe CA test and the HWD test.
Although the departure from HWE was not significant in either the cases or controls, the WA statistic uses the trend in that departure. To explain why the graphs show multiple spikes in each region on chromosomes 1, 5, and 9, we plotted pP = log_{10}(p) against the marker heterozygosity, 2q(1q), where q is the marker allele frequency. Figure 2 shows the result for chromosome 1. We observe large values of pP, between 4 and 8, when the heterozygosity is about 0.5, i.e., when the marker is most informative. It was verified that markers with high values of pP but smaller values of 2q(1q) (there are two such markers in Figure 2, heterozygosities about 0.11 and 0.20) are markers that are in high LD with those markers having heterozygosity equal to 0.5 that have high pP values; and similarly, those with low pP values but heterozygosity about 0.5, have low LD with the high pP markers. In other words, the strength of the signal depends on both the amount of LD and the heterozygosity of the linked marker.
Conclusion
Using the WA statistic, a sample size as small as 100 cases and 100 controls showed on average strong association (p < 0.001) with a susceptibility disease gene in the region between markers B03T3056 and B03T3058 on chromosome 3 and weaker association at marker B09T8334 on chromosome 9 (p < 0.05). For a larger sample size (300 cases and 300 controls), as expected much more significant pP values were observed.
Abbreviations
 CA:

CochranArmitage
 HWD:

HardyWeinberg disequilibrium
 HWE:

HardyWeinberg equilibrium
 WA:

Weighted average
References
 1.
Feder JN, Gnirke A, Thomas W, Tsuchihashi Z, Ruddy DA, Basava A, Dormishian F, Domingo R, Ellis MC, Fullan A, Hinton LM, Jones NL, Kimmel BE, Kronmal GS, Lauer P, Lee VK, Loeb DB, Mapa FA, McClelland E, Meyer NC, Mintier GA, Moeller N, Moore T, Morikang E, Prass CE, Quintana L, Starnes SM, Schatzman RC, Brunke KJ, Drayna DT, Risch NJ, Bacon BR, Wolff RK: A novel MHC class Ilike gene is mutated in patients with hereditary haemochromatosis. Nat Genet. 1996, 13: 399408. 10.1038/ng0896399.
 2.
Nielsen DM, Ehm MG, Weir BS: Detecting markerdisease association by testing for HardyWeinberg disequilibrium at a marker locus. Am J Hum Genet. 1998, 63: 15311540. 10.1086/302114.
 3.
Jiang R, Dong J, Wang D, Sun FZ: Finescale mapping using HardyWeinberg disequilibrium. Ann Hum Genet. 2001, 65: 207219. 10.1046/j.14691809.2001.6520207.x.
 4.
Sasieni PD: From genotypes to genes: doubling the sample size. Biometrics. 1997, 53: 12531261. 10.2307/2533494.
 5.
Devlin B, Roeder K: Genomic control for association studies. Biometrics. 1999, 55: 9971004. 10.1111/j.0006341X.1999.00997.x.
 6.
Song K, Elston RC: A powerful method of combining measures of association and HardyWeinberg equilibrium for finemapping in casecontrol studies. Stat Med.
 7.
Haseman JK, Elston RC: The investigation of linkage between a quantitative trait and a marker locus. Behav Genet. 1972, 2: 319. 10.1007/BF01066731.
 8.
Shete S, Jacobs KB, Elston RC: Adding further power to the Haseman and Elston method for detecting linkage in larger sibships: weighting sums and differences. Hum Hered. 2003, 55: 7985. 10.1159/000072312.
Acknowledgements
This work was supported in part by grants from the U.S. Public Health Service: resource grant RR03655 from the National Center for Research Resources; research grant GM28356 from the National Institute of General Medical Sciences; and contract HD23342 from the National Institute of Child Health and Human Development.
Author information
Additional information
Authors' contributions
KS participated in the design of the study, performed the weighted average analysis and was involved in drafting the manuscript. MSO participated in the design of the study, performed the linkage analysis, selected the unlinked SNPs, and helped in drafting the manuscript. QL participated in the design of the study and helped in the weighted average analysis and data manipulation. RCE was involved in critical revision of the manuscript for intellectual content. All authors were involved in the interpretation of the results and approved the final manuscript.
Kijoung Song, Mohammed S Orloff, Qing Lu contributed equally to this work.
Rights and permissions
About this article
Cite this article
Song, K., Orloff, M.S., Lu, Q. et al. Finemapping using the weighted average method for a casecontrol study. BMC Genet 6, S67 (2005) doi:10.1186/147121566S1S67
Published
DOI
Keywords
 Trend Test
 Population Stratification
 Genetic Analysis Workshop
 Multipoint Linkage Analysis
 Susceptibility Disease Gene