 Methodology article
 Open Access
 Published:
Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association
BMC Genetics volume 8, Article number: 63 (2007)
Abstract
Background:
In this paper, we propose a one degree of freedom test for association between a candidate gene and a binary trait. This method is a generalization of Terwilliger's likelihood ratio statistic and is especially powerful for the situation of one associated haplotype. As an alternative to the likelihood ratio statistic, we derive a score statistic, which has a tractable expression. For haplotype analysis, we assume that phase is known.
Results:
By means of a simulation study, we compare the performance of the score statistic to Pearson's chisquare statistic and the likelihood ratio statistic proposed by Terwilliger. We illustrate the method on three candidate genes studied in the Leiden Thrombophilia Study.
Conclusion:
We conclude that the statistic follows a chi square distribution under the null hypothesis and that the score statistic is more powerful than Terwilliger's likelihood ratio statistic when the associated haplotype has frequency between 0.1 and 0.4 and has a small impact on the studied disorder. With regard to Pearson's chisquare statistic, the score statistic has more power when the associated haplotype has frequency above 0.2 and the number of variants is above five.
Background
Haplotype analysis is a popular strategy to analyze multiple single nucleotide polymorphisms (SNPs) typed within one candidate gene [1, 2]. Most of the genetic variation of a gene is described by 4 to 8 relatively common haplotypes (frequencies > 0.05). To compare the distribution of observed haplotype frequencies between cases and controls, the global test statistic of Schaid et al. [3] can be used when haplotype ambiguity exists. When no uncertainty in phase exists, for example in regions with little recombinations, the simple Pearson's chi square statistic is used to test for association. Both statistics test for any difference in haplotype frequencies between cases and controls and the degrees of freedom depend on the number of observed haplotypes. However, researchers often aim to identify one ancestral haplotype [4]. For this situation one may consider a test which summarizes the difference between the null hypothesis and the alternative hypothesis in one parameter, yielding a more powerful one degree of freedom test. This paper aims to derive such a method.
For multiallelic markers, Terwilliger [5] proposed a model with one parameter (λ) for the excess frequency of the associated allele in the cases. To test for association he proposed a likelihood ratio test, i.e. comparing the likelihood under the alternative to the likelihood under no association, for testing. Since it is unknown which allele is associated, the likelihood is a weighted sum of conditional likelihoods of the data given that the allele is associated over the observed marker alleles. As for weights Terwilliger [5] proposed to use the allele frequencies in the population from which the cases and controls were sampled. Thus the most common allele is assigned the highest weight in the analysis. However, for haplotype analysis it is usually not the most common haplotype in the population which is associated to the disease. Therefore in this paper, we propose a model which assigns constant weight to each haplotype. We derive a score statistic from the likelihood of this model.
The log likelihood function of Terwilliger's model also appears to have some unusual statistical features. For example, the allele frequencies that are used as weights are also unknown parameters in the conditional likelihood functions. Maximizing the log likelihood function appears not always straightforward. Further its distribution under the null hypothesis is unknown. Terwilliger proposed to approximate the distribution of the likelihood ratio (TLR) statistic under the null hypothesis with the 50 : 50 mixture of chi square distributions of null and one degrees of freedom. This distribution appeared to yield conservative pvalues [6]. Another property of the likelihood function is that the score function, the first derivative of the log likelihood function with respect to λ evaluated at λ = 0 is a constant zero for any observed data for the weights that Terwilliger proposed. Hence the corresponding score statistic does not exist. Score statistics are often preferred over likelihood ratio statistics, because they have often a tractable expression which provides insight in the data and are robust against model deviations [7]. When constant weights instead of allele frequencies as weights are used the score statistic exists. Under the null hypothesis, the statistic follows a normal distribution.
In this paper we derive a method for the extreme situation of phase unambiguity, which means that we study genetic regions with almost no recombination [8]. This assumption is reasonable for many genes, for example fibrinogen alpha, beta and gamma [9] and CRP [10]. When some uncertainty about phase exists, the pvalues should be adapted. Our program [11] is able to compute valid pvalues for the situation of phase ambiguity by permutation of case and control status. The program calls the publicly available software TAGSNPS [8] to derive the haplotypes in each permutation step (see also Becker et al. [12] for similar methods).
The methods derived in this paper are meant to detect a common haplotype. In general genetic association studies have power to detect common variants. Thus these methods should detect associations under the common diseasecommon variant hypothesis. However, it is also possible that a rare causal mutation is sometimes present on the identified haplotype. For example Hylckama Vlieg et al. [13] detected a common haplotype that was overrepresented in patients with deep venous thrombosis compared to healthy controls. Many homozygous carriers of this haplotype carried the factor V Leiden mutation, the most important genetic risk factor for deep venous thrombosis, which had been discovered before by other methods [14]. Thus if homozygous carriers of the associated haplotype were sequenced the factor V Leiden mutation would have been detected.
We illustrate the method by an association analysis of three candidate genes, fibrinogen alpha (FGA), beta (FGB) and gamma (FGA)in the Leiden Thrombophilia Study (LETS) [9]. In these data, no phase ambiguity was present and the pair of haplotypes could be derived for each individual. The number of common haplotypes varied from 3 to 5. The frequency of the associated haplotype varied from 0.2 to 0.3. In this paper, we consider the score statistic as an alternative to the classical chisquare and the original TLR statistic of [5] in the context of haplotypes. We also include the likelihood ratio statistic corresponding to the log likelihood using equal weights. We carried out a simulation study to examine the performance of the four statistics under the null hypothesis and to compare the power of the four statistics under various alternatives. Finally we illustrate the proposed statistics by analysis of the haplotype distributions of the FGG, FGB and FGA genes.
Results
Simulation study
By means of a simulation study we compared the performance of the new score statistic ${\widehat{S}}_{\frac{1}{m}}$ with Pearson chi squared χ^{2} and the Terwilliger's likelihood ratio with weights equal to p_{ j }'s (TLR). We also considered a likelihood ratio test with equal weights (LR). We first evaluated the type I error rates of the test statistics. For the score statistic we used the chi square distribution with one degree of freedom to approximate the distribution under the null hypothesis. For the LR and TLR statistics we used the 50:50 mixture of two chi squares with zero and one degree of freedom. We generated 10,000 samples of 200 case chromosomes and 200 control chromosomes from the multinomial distributions with probabilities p_{1}⋯p_{ m }for m equal to 4, 5, 8, 10, 15 and 20 haplotypes. Similar to the simulation described by Terwilliger, the frequency of the most common haplotype, p_{1}, was set to 0.5, whereas the remaining haplotypes were equally frequent (0.5/(m  1)). The results are shown in the left columns of Table 1.
For all m, the type I error rates of the score statistic ${\widehat{S}}_{\frac{1}{m}}$ were maintained at the nominal error rate. For m < 10, the type I error rates of Pearson's chi square corresponded to the nominal level, but became conservative for larger m due to sparse data. For all m considered, the type I error rates for the TLR statistic were conservative (< 0.03). The type I error rates for the LR statistic were also somewhat small (≈ 0.04), but were better than that of the TLR statistic.
To study the power of the statistics, we generated 10,000 samples of 100 case chromosomes and 100 control chromosomes from the multinomial distributions with probabilities p_{1}(1  λ) + λ and p_{ j }(1  λ) for j = 2⋯m for cases and p_{ j }j = 1⋯m for controls, respectively. First, we considered the model used by Terwilliger [5]. The most common haplotype frequency p_{1} in controls was again set to 0.5 and this haplotype was more frequent in cases. The parameter λ was fixed to 0.5 which corresponds to a haplotype frequency of 0.75 in the cases. The number of haplotypes m was again set to 4, 5, 8, 10, 15 and 20. The results are shown in the right columns of Table 1.
For m < 8, the power of Pearson chi square statistic was good. The ${\widehat{S}}_{\frac{1}{m}}$ performed better than the LR statistic with equal weights. The power of the score statistic ${\widehat{S}}_{\frac{1}{m}}$ was similar to the power of TLR statistic, for m ≤ 8. For m > 8, the TLR statistic had higher power than ${\widehat{S}}_{\frac{1}{m}}$. For m > 8, the power of the score statistic ${\widehat{S}}_{\frac{1}{m}}$ is smaller, because the weight 1/m is small and the frequencies of the non associated haplotypes are too small, yielding a large variance of the score statistic (see formula 3). Because the LR statistic appeared to perform worse than both ${\widehat{S}}_{\frac{1}{m}}$ and TLR, we did not consider this statistic in the following simulations.
Second, we studied the power of the Pearson's chi square, ${\widehat{S}}_{\frac{1}{m}}$, and TLR as a function of the excess frequency λ for various values of the frequency of the associated haplotype p_{1} = 0.1, 0.2, 0.3, 0.4 and 0.5. The remaining haplotypes were again equally frequent. We restricted ourselves to the number of observed haplotypes m of 5 and 8, because most of the genes can be characterized by up to 8 common variants. The parameter λ was varied between 0 and 0.5. The number of chromosomes n_{1} and n_{2} were 200. We used a nominal significance level of 0.05. The results are depicted in Figure 1.
For p_{1} = 0.5, the score statistic ${\widehat{S}}_{\frac{1}{m}}$ and likelihood ratio TLR performed similarly, both better than Pearson's chi square. For m = 5 and p_{1} = 0.4 or 0.3 and for m = 8 and p_{1} = 0.4, 0.3 or 0.2, the score statistic ${\widehat{S}}_{\frac{1}{m}}$ performed better than TLR especially for small λ. For p_{1} = 0.2 and m = 5 all three statistics had similar power. For p_{1} = 0.1 and m = 5 or m = 8, Pearson's chisquare performed similar to TLR and both statistics performed better than ${\widehat{S}}_{\frac{1}{m}}$ except for λ ≤ 0.1 and p_{1} = 0.1 and m = 5. Note that for p_{1} = 0.1, the pooled frequency of the associated haplotype is around $\frac{1}{m}$ and hence the expectation of ${\widehat{S}}_{\frac{1}{m}}$ under the alternative becomes small (see formula 4). For m = 5, the pooled frequency is equal to $\frac{1}{m}$ = 0.2 at λ = 0.2 and for m = 8 the pooled frequency is equal to $\frac{1}{m}$ = 0.125 at λ = 0.06. Thus when the associated haplotype has frequency around $\frac{1}{m}$, the score statistic loses power.
Especially for common variants with frequency p_{1} of 0.3 or 0.2 and a small impact on the disease (λ ≤ 0.1), the score statistic performed better than TLR statistic. For m = 5 and m = 8 and p_{1} = 0.3, the gain in power of the score statistic compared to TLR statistic was about 4% and 8% for λ of 0.05 and 0.1 respectively. For m = 5 and p_{1} = 0.2, both statistics performed similar. For m = 8 and p_{1} = 0.2 the gain in power of the score statistic was large, namely 7% and 12% for λ of 0.05 and 0.1 respectively.
Data example
We applied the three statistics to a study on haplotype association of fibrinogen alpha (FGA), beta (FGB) and gamma (FGG) with risk of deep venous thrombosis in LETS [15, 16]. Fifteen haplotype tagging SNPs were typed in 474 cases and 474 controls [9]. Within the three genes, the SNPs were in high linkage disequilibrium (${r}_{h}^{2}$ > 0.95). The number of common haplotypes (frequency greater than 5%) describing FGG, FGA and FGB were three, five and five respectively. Since we focus on common haplotypes, we pooled the rare haplotypes with similar haplotypes by dropping rare tagging SNPs. For FGG, we pooled H4 with H1 and for FGB, we pooled H5 with H4 and H7 with H3 (see Uitte de Willige et al. [9] for more details on the tagging SNPs). In this analysis we considered pvalues below 0.05 to be significant. In Table 2 the data are described and the results are given.
For all genes, haplotype H2 appeared to be more frequent in the cases than in the controls. For FGG, FGA and FGB the allelic odds ratios of presence of H2 versus the rest was 1.34, 1.29 and 1.28 respectively. Note that these odds ratios were rather similar while the pvalues of the corresponding chi square statistics were different namely, 0.005, 0.034 and 0.090 respectively. The difference in pvalues was caused by the difference in degrees of freedom of the chi square statistics and the frequencies of other haplotypes. From the results of the standard chisquare statistics we concluded that only FGG was significantly associated to thrombophilia.
The pvalues of the TLR were respectively 0.003, 0.012 and 0.072. These pvalues were in line with the estimates of λ, namely 0.08, 0.07 and 0.05 respectively. Since FGA and FGB both had 5 variants, the frequency of the associated haplotype H2 was 0.3 and 0.2 respectively and the λ's were rather small, the score statistic should have more power than TLR for these genes. On the other hand TLR may perform better on the FGG gene, because of the larger difference in frequency between cases and controls and because the frequencies in the pooled set are around 1/m. For this situation, the score statistic ${\widehat{S}}_{\frac{1}{m}}$ has little power.
The pvalues of the score statistic ${\widehat{S}}_{\frac{1}{m}}$ were 0.515, 0.003 and 0.049 for FGG, FGA and FGB respectively. Indeed the pvalues for FGA and FGB were smaller than the corresponding pvalues of TLR statistic and the score statistic had no power for FGA. Based on the results of the score statistic, FGA and FGB were significantly associated with thrombophilia risk.
Although for each subject we were able to derive the pair of haplotypes, we also estimated the pvalues of the statistics by 1000 permutations using TAGSNPS to estimate the haplotype counts in each permutation step. The results were similar.
Discussion
In this report we have derived a new score statistic to test for association between a candidate gene and a binary trait. Score statistics are easy to compute, locally most powerful, and robust to small model deviations under the alternative [7]. From our simulations it appears that under the null hypothesis the statistic follows a chisquare distribution. For candidate genes with a small impact on the disease (λ small), five to eight observed variants, and a frequency of the associated variant in the range of 0.2 to 0.4, the new statistic performs better than the Terwilliger's LR statistic. For larger frequencies (around 0.5) or for a large number of variants (m > 8), Terwilliger's LR statistic appears to perform better, because the weight corresponding to the associated haplotype is larger. Compared to Pearson's chisquare statistic, the score statistic has more power when the associated haplotype has frequency larger than 0.2 and the number of observed variants m is larger than 5. Especially when the frequency of the associated haplotype is small (< 0.2), Pearson chisquared statistic performs similar or better than the score statistic.
We used a mixture of two chisquare distributions to approximate the distribution of Terwilliger's LR statistic under the null hypothesis. It is known that this distribution appears to give conservative pvalues [6]. Permutations of the casecontrol status may result in better pvalues. However, permutations are not attractive when no simple formulae for the statistic exists.
When the sampled population is not in Hardy Weinberg equilibrium, the considered statistics are not valid since they assume that the haplotypes of a subject are independent. For Pearson's chisquare statistic on haplotypes, a multiallelic version of Armitage's trend test on pairs of haplotypes is available [17, 18]. Under Hardy Weinberg equilibrium these statistics are asymptotically equivalent. To derive a one degree of freedom score statistic for pairs of haplotypes, a logistic regression model can be used with the number of associated haplotypes as covariate. Then the total likelihood is the average of the likelihoods over these logistic regression models of possible associated haplotypes. Here more research is needed. In this paper we used a smoothing approach to deal with the fact that the associated haplotype is unknown. As alternative, one may consider taking the maximum over each haplotype. A disadvantage of the latter approach is that permutations are needed to derive a pvalue [19].
In this paper we assumed that the haplotypes could be derived from the typed tagging SNPs. When uncertainty in phase exists the haplotype frequencies have to be estimated using for example an EM algorithm [20] and the uncertainty has to be taken into account when the pvalues are derived. This can easily be done by using permutations. Our program [11] is able to compute valid pvalues in case of ambiguous phase by calling TAGSNPS [8] to derive the haplotype counts in each permutation step. It may be more efficient to incorporate phase unambiguity in the likelihood function. The gain in efficiency will probably be small. Analyzing genes with little information on phase yields many relatively rare haplotypes and interpretation of the results is often hard. Using permutations to deal with haplotype uncertainty is in the line with recent developments in weighted regression analysis [21] and in imputation of haplotypes [22]. The new score statistic has no power when the frequency of the associated haplotype in the pooled set of cases and controls is around $\frac{1}{m}$. This raises the question if a better summary statistic of Pearson's chi square statistic into an one degree of freedom statistic exists. Here more research is needed. As an alternative for the score function, the second derivative of the log likelihood may be used [23]. In another paper we study the performance of this second derivative of the log likelihood function of Terwilliger [24].
Conclusion
We conclude that by choosing alternative weights, in particular constant weights, in the likelihood of Terwilliger, a set of new powerful and robust statistical tests was derived. For genetic association studies aiming to identify common associated variants, the new statistic should be used in addition to the standard Pearson's chi square statistic. By using both statistics more insight in the data can be obtained. A program [11] which computes the statistics and corresponding pvalues is freely available.
Methods
Let m be the number of haplotypes describing most of the genetic variation in a gene. Assume that the haplotype frequencies are in HardyWeinberg equilibrium proportions. Let p = (p_{1},⋯, p_{ m }) be the vector of haplotype frequencies in controls. Assume that only one haplotype denoted with index i is overrepresented in the cases, then the haplotype frequencies in the cases can be modelled as q_{ i }= p_{ i }+ λ(1  p_{ i }) and q_{ j }= p_{ j } λp_{ j }for j ∈ (1,⋯,i  1, i + 1,⋯,m). Let x = (x_{1},⋯,x_{ m }) and y = (y_{1},⋯,y_{ m }) be vectors of haplotype counts in the cases and the controls, respectively, and let n_{1} and n_{2} be the total number of case chromosomes and of control chromosomes, respectively, and n = n_{1} + n_{2}. Then the conditional likelihood L_{ i }given that haplotype i carries the mutation is given by
and the likelihood proposed by Terwilliger is equal to
with L_{ j }given in formula (1).
It is easy to see that likelihood function (2) can be generalized to the following likelihood function:
with L_{ j }given in formula (1) and w = (w_{1},⋯,w_{ m }) a vector of known positive weights restricted by ${\sum}_{j=1}^{m}{w}_{j}=1$. The first derivative of the log likelihood l(λ, px, y, w) = log(L(λ, px, y, w)) to λ evaluated in λ = 0 is equal to
For known allele frequencies p_{ j }, the distribution of the U_{ w }under H_{0} can be approximated by the normal distribution with zero mean and variance $\text{VAR}[{U}_{w}]={n}_{1}({\displaystyle {\sum}_{j=1}^{m}{w}_{j}^{2}{p}_{j}^{1}1)}$. Note that U_{ w }= 0 when for all j w_{ j }= p_{ j }.
Often the haplotype frequencies are unknown and have to be estimated from the data. Under the null hypothesis we can estimate the frequencies from the combined sample of cases and controls by ${\widehat{p}}_{j}=\frac{{x}_{j}+{y}_{j}}{n}$ and an estimate of the score statistic U_{ w }is given by
under H_{0}, ${\widehat{U}}_{w}$ has approximately mean equal to zero and variance
Note that the variance $\text{VAR}({\widehat{U}}_{w})$ is increased by n_{2}/n fold compared to the variance VAR(U_{ w }) because the allele frequencies are estimated from the data. Now the score statistic ${\widehat{S}}_{w}$ is defined by
where $\text{V}\widehat{\text{A}}\text{R}({\widehat{U}}_{w})$ is obtained by replacing p_{ j }by its estimate ${\widehat{p}}_{j}$ in formula (3). When all haplotypes are common, a natural choice of weights is ${w}_{j}=\frac{1}{m}$.
Under the alternative hypothesis of the presence of one positively associated haplotype i, the expectation of ${\widehat{U}}_{\frac{1}{m}}$ is
with q_{ i }= p_{ i }+ λ(1  p_{ i }). Note that $\frac{{n}_{1}{p}_{i}+{n}_{2}{q}_{i}}{n}$ is the frequency of the associated haplotype in the combined sample. When $\frac{{n}_{1}{p}_{i}+{n}_{2}{q}_{i}}{n}=\frac{1}{m}$, the expectation of the score statistic is equal to 0. Hence the score statistic has little power. When $\frac{{n}_{1}{p}_{i}+{n}_{2}{q}_{i}}{n}$ is larger than $\frac{1}{m}$ the expectation becomes negative. Therefore we propose a chisquare distribution with one degree of freedom to approximate the distribution of this statistic under the null hypothesis.
Abbreviations
 LR:

Likelihood ratio with equal weights
 TLR:

Terwilliger's likelihood ratio
 FGA:

fibrinogen alpha
 FGB:

fibrinogen beta
 FGG:

fibrinogen gamma
 SNP:

single nucleotide polymorphism
 LETS:

the Leiden thrombophilia study
References
 1.
Clark AG: The role of haplotypes in candidate gene studies. Genet Epidemiol. 2004, 27: 32133. 10.1002/gepi.20025.
 2.
Balding DJ: A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006, 7: 78191. 10.1038/nrg1916.
 3.
Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet. 2002, 70: 42534. 10.1086/338688.
 4.
Stram DO: Tag SNP selection for association studies. Genet Epidemiol. 2004, 27: 36574. 10.1002/gepi.20028.
 5.
Terwilliger JD: A powerful likelihood method for the analysis of linkage disequilibrium between trait loci and one or more polymorphic marker loci. Am J Hum Genet. 1995, 56: 77787.
 6.
Sham PC, Curtis D, MacLean CJ: Likelihood ratio tests for linkage and linkage disequilibrium: asymptotic distribution and power. Am J Hum Genet. 1996, 58: 10935.
 7.
Cox D, Hinkley DV: Theoretical statistics. 1974, Florida, USA: Chapman & Hall
 8.
Stram DO, Haiman CA, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Pike MC: Choosing haplotypetagging SNPS based on unphased genotype data using a preliminary sample of unrelated subjects with an example from the Multiethnic Cohort Study. Hum Hered. 2003, 55: 2736. 10.1159/000071807.
 9.
Uitte de Willige S, de Visser MC, HouwingDuistermaat JJ, Rosendaal FR, Vos HL, Bertina RM: Genetic variation in the fibrinogen gamma gene increases the risk of deep venous thrombosis by reducing plasma fibrinogen gamma' levels. Blood. 2005, 106: 417683. 10.1182/blood2005052180.
 10.
Carlson CS, Aldred SF, Lee PK, Tracy RP, Schwartz SM, Rieder M, Liu KA, Williams OD, Iribarren C, Lewis EC, Fornage M, Boerwinkle E, Gross M, Jaquish C, Nickerson DA, Myers RM, Siscovick DS, Reiner AP: Polymorphisms within the Creactive protein (CRP) promoter region are associated with plasma CRP levels. Am J Hum Genet. 2005, 77: 6477. 10.1086/431366.
 11.
Helmer Q, El Galta R, HouwingDuistermaat JJ: One degree of freedom tests for genetic association. [Http://www.msbi.nl/genetics]
 12.
Becker T, Cichon S, Jonson E, Knapp M: Multiple testing in the context of haplotype analysis revisited: application to casecontrol data. Ann Hum Genet. 2005, 69: 74756. 10.1111/j.15298817.2005.00198.x.
 13.
Van Hylckama Vlieg A, Sandkuijl LA, Rosendaal FR, Bertina RM, Vos HL: Candidate gene approach in association studies: would the factor V Leiden mutation have been found by this approach?. Eur J Hum Genet. 2004, 12: 47882. 10.1038/sj.ejhg.5201183.
 14.
Bertina RM, Koeleman BP, Koster T, Rosendaal FR, Dirven RJ, de Ronde H, van der Velden PA, Reitsma PH: Mutation in blood coagulation factor V associated with resistance to activated protein C. Nature. 1994, 369: 6467. 10.1038/369064a0.
 15.
Koster T, Rosendaal FR, Deronde H, Briet E, Vandenbroucke JP, Bertina RM: Venous thrombosis due to poor anticoagulant response to activated proteincleiden thrombophilia study. Lancet. 1993, 342: 15036. 10.1016/S01406736(05)800819.
 16.
van der Meer FJM, Koster T, Vandenbroucke JP, Briet E, Rosendaal FR: The Leiden thrombophilia study (LETS). Thromb Haemost. 1997, 78: 6315.
 17.
Slager SL, Schaid DJ: Evaluation of candidate genes in casecontrol studies: A statistical method to account for related subjects. Am J Hum Genet. 2001, 68: 145762. 10.1086/320608.
 18.
Czika W, Weir BS: Properties of the multiallelic trend test. Biometrics. 2004, 60: 6974. 10.1111/j.0006341X.2004.00166.x.
 19.
Sham PC, Curtis D: Monte Carlo tests for associations between disaese and alleles at highly polymorphic loci. Ann Hum Genet. 1995, 59: 97105. 10.1111/j.14691809.1995.tb01608.x.
 20.
Excoffier L, Slatkin M: Maximumlikelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol. 1995, 12: 9217.
 21.
Cordell HJ: Estimation and testing of genotype and haplotype effects in casecontrol studies: comparison of weighted regression and multiple imputation procedures. Genet Epidemiol. 2006, 30: 25975. 10.1002/gepi.20142.
 22.
Dai JY, Ruczinski I, LeBlanc M, Kooperberg C: Imputation methods to improve inference in SNP association studies. Genet Epidemiol. 2006, 30: 690702. 10.1002/gepi.20180.
 23.
El Galta R, Hsu L, HouwingDuistermaat JJ: Methods to test for association between a disease and a multialleleic marker applied to a candidate region. BMC Genet. 2005, 6S: 10110.1186/147121566S1S101.
 24.
El Galta R: Statistical methods for analysing complex genetic traits. PhD thesis. 2006, Leiden University Medical Centre
Acknowledgements
This study was supported by Program Grants from the Netherlands Organization for Scientific Research (NWO 91203014 (REG, QH) and NWO 91202036 (SUdeW, MCHdeV)).
Author information
Affiliations
Corresponding author
Additional information
Authors' contributions
REG participated in method development, wrote a C code, carried out the simulation study and data analysis, participated in interpreting results, and in drafting of the manuscript. SUdeW and MCHdeV provided the data and participated in interpreting the results. QH designed the web interface and extended the C code to account for phase uncertainty. LH participated in method development and in interpreting results. JJHD participated in method development and in interpreting results, and participate in drafting of the manuscript. All authors read and approved the final version.
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Cite this article
el Galta, R., Uitte de Willige, S., de Visser, M.C. et al. Generalizing Terwilliger's likelihood approach: a new score statistic to test for genetic association. BMC Genet 8, 63 (2007). https://doi.org/10.1186/14712156863
Received:
Accepted:
Published:
Keywords
 Haplotype Frequency
 Score Statistic
 Thrombophilia
 Common Haplotype
 Likelihood Ratio Statistic