 Proceedings
 Open Access
 Published:
Methods to test for association between a disease and a multiallelic marker applied to a candidate region
BMC Genetics volume 6, Article number: S101 (2005)
Abstract
We report the analysis results of the Genetic Analysis Workshop 14 simulated microsatellite marker dataset, using replicate 50 from the Danacaa population. We applied several methods for association analysis of multiallelic markers to casecontrol data to study the association between Kofendrerd Personality Disorder and multiallelic markers in a candidate region previously identified by the linkage analysis. Evidence for association was found for marker D03S0127 (p < 0.01). The analyses were done without any prior knowledge of the answers.
Background
Terwilliger [1] proposed a powerful method for the association analysis between a disease and a multiallelic marker. The model assumes that only one marker allele is associated with the disease and that any marker allele may be associated with the disease with prior probability equal to its allele frequency in the population. The excess allele in the cases is modelled by a parameter λ, the population attributable risk [2]. The likelihood of the data given the allele frequencies and the parameter λ is the weighted sum of the conditional likelihood functions given that an allele is associated with the disease over all marker alleles with weights equal to the allele frequencies. Hence, more weight is assigned to more frequent marker alleles.
To test the null hypothesis (λ = 0) against the alternative hypothesis (λ > 0), Terwilliger [1] proposed a likelihood ratio (LR) statistic. However, this statistic appeared to be conservative and computation of the maximumlikelihood estimates might be slow. Another point mentioned by Sham et al. [3] is that this LR test statistic might not be robust against model deviation, especially when there is more than one allele associated with the disease. With this consideration, we derived the corresponding score statistic U, which is a linear combination of Pearson's χ^{2} and a weighted sum of observed minus expected allele counts in cases. The score test is locally most powerful and because it is evaluated under the null hypothesis, it is expected to be robust against model deviation [4]. The score statistic U is easy to compute, which enables one to use MonteCarlo permutations to estimate the empirical pvalue of the test statistic [5]. For a large number of alleles Pearson's χ^{2} follows asymptotically a normal distribution [6]. Hence, for a large sample size and for a large number of marker alleles the distribution of the score test U under the null hypothesis can be approximated by a normal distribution. Another alternative may be to replace the weights in the LR statistic proposed by Terwilliger by equal weights, which might be suitable if the associated allele is less common.
For replicate 50 of the Danacaa population we applied the Pearson's χ^{2}, the score test, and Terwilliger's LR test to microsatellite markers D03S0124, D03S0125, D03S0126, and D03S0127 to test their association with Kofendrerd Personality Disorder (KPD). The allelic distribution was compared between a sample of 100 cases and a sample of 50 controls. In order to ensure high power, one might either select more controls, because they are easier to ascertain than cases or to compare the allele frequencies in cases to the allele frequencies in the population if they are known. Because the allele frequencies in controls were supplied by the Genetic Analysis Workshop 14 (GAW14), we considered the latter option to verify the result of markers that showed significant association with KPD.
Materials and methods
Score test
Suppose we have a multiallelic marker. Let p_{i} be the frequency of the i^{th} allele in the controls. Suppose we have n_{1} unrelated case chromosomes and n_{2} unrelated control chromosomes. Let x_{i} and y_{i} be the i^{th} allele counts in cases and controls respectively. The score statistic corresponding to the likelihood proposed by Terwilliger [1] is
U = ∑(x_{i}  n_{1}p_{i})^{2}/p_{i}  ∑(x_{i}  n_{1}p_{i})/p_{i},
where the sum is taken over the alleles. When the allele frequencies are unknown, p_{i} can be estimated by the frequencies in combined sample (x_{i}+y_{i})/(n_{1}+n_{2}). When more than one allele is associated with the disease, the score test U is expected to perform better than the LR, because it sums over the contributions of the alleles.
Data analysis
Firstly we selected four replicates from each of the Aipotu, Karangar, and Danacaa populations to perform genomewide linkage analysis, i.e., we analyzed 12 replicates. Each replicate consisted of 100 nuclear families. For each replicate we applied the singlepoint S_{pairs} allelesharing scoring function [7] as implemented in the MERLIN program [8] to search for regions with evidence for linkage. The parental genotypes were used to compute the probabilities of sharing 0, 1, or 2 alleles identically by descent. A region on chromosome 3 showed a significant linkage to latent disease locus for several populations at level 0.0001.
For testing the association using the proposed methods, we selected replicate 50 of the Danacaa population, because in this replicate marker D03S0127 showed highly significant linkage to the disease locus with a LOD score greater than 6 (p < 0.0001). Flanking markers D03S0126, D03S0125, and D03S0124 showed borderline linkage with LOD scores equal to 1.35, 1.45, and 2.42, respectively.
In order to obtain marker genotypes for 50 unrelated controls for the association analysis we purchased packets 149 to 153. The first affected in each family (n = 100) was used as a case regardless of being child or parent. We tested for the HardyWeinberg equilibrium to each microsatellite marker in the controls. Then we applied the score test U, Pearson's χ^{2}, and Terwilliger's LR to study association with KPD. For the score test U and Pearson's χ^{2}, we used MonteCarlo permutations to estimate the empirical pvalues. pValues lower than 0.05 were considered to be significant.
As an alternative to using the controls, we also used the provided allele frequencies as reference allelic frequency distribution for Pearson's χ^{2}, the score U, and Terwilliger's LR. We also used Terwilliger's LR with equal weights.
Finally, additional SNPs in the vicinity of the associated marker D03S0127 were tested for association and the linkage disequilibrium between markers was studied in this region.
Results
All markers were in HardyWeinberg equilibrium proportions. Table 1 presents the pvalues for the association analysis of various markers with the disease. Marker D03S0127 appeared to be highly significantly associated with the disease. The score U and Pearson's χ^{2} gave about the same pvalue (p = 0.008, 0.007), whereas Terwilliger's LR yielded somewhat a larger pvalue (p = 0.033). For this marker, allele 1 and 3 were present 2 and 3.3 times more often in cases than in controls respectively, whereas allele 6 occurred approximately 2.8 times as often in controls as in cases. Marker D03S0125 showed borderline significant association with KPD. Next we repeated the analysis of association between KPD and marker D03S0127 using the provided allele frequencies of 0.070, 0.206, 0.100, 0.114, 0.048, 0.111, 0.154, and 0.197 for allele 1 to 8, respectively. Again the score statistic U and Pearson's χ^{2} yielded similar empirical pvalues (p = 0.027), while LR of Terwilliger and LR with equal weights gave an asymptotic pvalue of 0.029 and 0.023, respectively. Compared to the given allele frequencies only allele 3 showed some excessive frequency in the cases, and it occurred about 1.7 times as often in cases as in the population.
Discussion
In this paper we reported results of several methods for studying association between a disease and a multiallelic marker. Marker D03S0127 located at chromosome 3 showed significant association with the disease. Both score U and Pearson's χ^{2} tests gave somewhat lower pvalues than the Terwilliger's LR test. Further examination shows that marker D03S0127 appeared to have two positively associated alleles. When we assumed known allele frequencies, only one allele was positively associated with the disease and all test statistics yielded similar pvalues. Perhaps the fact that there are two associated alleles might be the reason that Terwilliger's LR test yielded somewhat larger pvalue in this dataset. To study whether this holds in general, an extensive simulation study is needed.
In addition to Pearson's χ^{2} and LR, a new test statistic was applied to the GAW14 simulated data. The new test statistic is derived based on the score function under the null hypothesis. So it possesses the usual optimal properties of other score test statistics: locally most powerful and robust against model misspecification. In contrast to the LR test statistic, the new score statistic is very easy to compute and uses MonteCarlo methods to derive empirical pvalues. Details of the derivation of this score statistic as well as a simulation study of its power will be extensively provided in another paper.
The parameter λ is a preferred measure of allelic association because it is directly related to recombination fraction and it is less sensitive to allele frequencies than other measures [2]. However, when allelic association is modelled by means of λ, it is not straightforward to adjust for other covariates. HouwingDuistermaat and Elston [9] discussed various ways to quantify allelic association and estimate the location of a gene responsible for disease using logistic regression models. As an alternative to λ, the log relative risk as measured by the regression coefficient in the logistic model may be used to allow for adjustment of other covariates. More research is needed to build this kind of flexible model.
Applying Pearson's χ^{2} with one degree of freedom to 19 SNPs revealed strong association between KPD disease and two diallelic markers in this region: SNP B03T3056 and SNP B03T3057. Furthermore, LD observed between B03T3056 and B03T3057 and B03T3056 and D03S0127 further confirms the precedent results.
Conclusion
All test statistics showed significant association between D03S0127 and KPD. Probably due to the presence of more than one positively associated allele, the Pearson's χ^{2} and score tests yielded lower pvalues than the Terwilliger's LR test in this dataset.
Abbreviations
 GAW14:

Genetic Analysis Workshop 14
 KPD:

Kofendrerd Personality Disorder
 LR:

Likelihood ratio
References
 1.
Terwilliger JD: A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am J Hum Genet. 1995, 56: 777787.
 2.
Devlin B, Risch N: A comparison of linkage disequilibrium measures for finescale mapping. Genomics. 1995, 29: 311322. 10.1006/geno.1995.9003.
 3.
Sham PC, Curtis D, MacLean CJ: Likelihood ratio tests for linkage and linkage disequilibrium: asymptotic distribution and power. Am J Hum Genet. 1996, 58: 10931096.
 4.
El Galta R, Stijnen T, HouwingDuistermaat JJ: Score statistic for analysis of association between disease and a multiallelic marker [abstract]. Genet Epidemiol. 2004, 27: 268
 5.
Sham PC, Curtis D: Monte Carlo tests for associations between disease and alleles at highly polymorphic loci. Ann Hum Genet. 1995, 59: 97105.
 6.
Haldane JBS: The mean and variance of χ^{2} when used as a test of homogeneity, when expectations are small. Biometrika. 1939, 31: 346355. 10.2307/2332614.
 7.
Whittemore AS, Halpern J: A class of tests of linkage using affected pedigree members. Biometrics. 1994, 50: 118127. 10.2307/2533202.
 8.
Abecasis GR, Cherny SS, Cookson WO, Cardon LR: Merlin–rapid analysis of dense genetic maps using sparse gene flow trees. Nat Genet. 2002, 30: 97101. 10.1038/ng786.
 9.
HouwingDuistermaat JJ, Elston RC: Linkage disequilibrium mapping of complex genetic diseases using multiallelic markers. Genet Epidemiol. 2001, 21: 576581.
Acknowledgements
This study is supported by Program Grant of the Netherlands Organization for Scientific Research (NWO 91203014).
Author information
Additional information
Authors' contributions
REG participated in method development, prepared data, carried out data analysis, participated in interpreting results, and drafted the manuscript. LH participated in method development and in interpreting results. JJHD participated in method development and in interpreting results, and supervised the drafting of the manuscript.
Rights and permissions
About this article
Cite this article
El Galta, R., Hsu, L. & HouwingDuistermaat, J.J. Methods to test for association between a disease and a multiallelic marker applied to a candidate region. BMC Genet 6, S101 (2005) doi:10.1186/147121566S1S101
Published
DOI
Keywords
 Score Test
 Score Statistic
 Marker Allele
 Allelic Association
 Marker D03S0127