 Proceedings
 Open Access
 Published:
Combining evidence for association from transmission disequilibrium and casecontrol studies using singlenucleotide polymorphisms
BMC Genetics volume 6, Article number: S106 (2005)
Abstract
The aim of the present analysis is to combine evidence for association from the two most commonly used designs in genetic association analysis, the casecontrol design and the transmission disequilibrium test (TDT) design. The cases here are affected offspring from nuclear families and are used in both the casecontrol and TDT designs. As a result, inference from these designs is not independent. We applied a simple logistic regression method for combining evidence for association from casecontrol and TDT designs to singlenucleotide polymorphism data purchased on a region on chromosome 3, replicate 1 of the Aipotu population. Combining the evidence from the casecontrol and TDT designs yielded a 5–10% reduction in the standard errors of the relative risk estimates. The authors did not know the results before the analyses were conducted.
Background
To establish allelic association between singlenucleotide polymorphisms (SNPs) and a disease, broadly speaking, two types of designs dominate. The first is the classical casecontrol study, where the frequency of a certain allele is compared between cases and controls. The other is the transmission disequilibrium test (TDT) [1]. The TDT is a familybased method for linkage and association that is, unlike the casecontrol study, not sensitive to possible population stratification. The TDT and the casecontrol studies have essentially the same objective, namely either to identify polymorphisms (alleles) that are causally related to a phenotypic trait, or to identify polymorphisms in high linkage disequilibrium to such a causal allele. The methods only differ in methodology; the TDT looks for such alleles through associations within families whereas casecontrol studies do so by identifying associations within populations. For the TDT, triads consisting of parents and an affected child are needed, which may be hard to obtain. In such a situation, combining evidence for association from TDT and casecontrol designs may be helpful.
Such a mixture of TDT and casecontrol designs can occur in a number of ways. To name just a few possibilities: 1) a TDT study was originally designed, and controls were subsequently added to increase power, or linkage was found in nuclear families, and these data were combined with controls for a casecontrol analysis; 2) a casecontrol study was originally designed, and a TDT study was then set up to confirm findings, or parents of cases were later genotyped in a haplotype study in order to gain phase information [2].
Results from the separate designs are not independent, because the same cases are used in the casecontrol and TDT design.
In the Genetic Analysis Workshop 14, data are available on nuclear families and a modest number of controls. Ideally, one would like to combine these sources of data as efficiently as possible. In a paper by Nagelkerke et al. [3], it is shown how this can be done using simple logistic regression.
Methods
Statistical analysis
Consider a SNP with alleles 1 and 2. Suppose that allele 2 is the high risk allele, and that 1 is the reference allele. We assume an additive model, where the relative risks of disease of a 1/2 heterozygote and a 2/2 homozygote with respect to a 1/1 homozygote equal γ and γ^{2}, respectively. The parameter γ is our parameter of interest; in what follows we refer to γ as the effect parameter and to estimates of γ as effect estimates. Let p be the frequency in the population of the high risk (allele 2) allele. We consider first one affected individual per nuclear family and show later how to adapt the analysis in case of multiple affected subjects. The likelihood of p and γ is given by
∏ P(genotypes of triplets  offspring affected; p, γ)
×
∏ P(genotypes of controls  p),
the first term corresponding to the TDT design, the second corresponding to the controls.
For a TDT family, let G_{ o }and G_{ p }denote the genotypes of offspring and parents, respectively, and let "case" denote the event that the offspring is affected. The likelihood contribution of a TDTfamily is given by
P(G_{ p }, G_{ o } "case") = P(G_{ o } G_{ p }, "case"; p, γ)·P(G_{ p } "case"; p, γ).
The first factor deals with transmission of alleles from parents to offspring, i.e., the TDT in its likelihood formulation [4]. The second factor essentially regains the information that was lost by using the TDT instead of the maximum likelihood estimator [3]. The complete likelihood can thus be factorized alternatively as
∏ P(G_{ o } G_{ p }, offspring affected; p, γ)
× (1)
∏ P(G_{ p } offspring affected; p, γ)·P(G_{ c } p),
where G_{ c }denotes the genotypes of controls. Nagelkerke et al. [3] then show that a single logistic regression with outcome y and two covariates x and z, given by
logit (pr(y = 1)) = exp(α + β z + γ x) (2)
can be carried out in order to obtain a single approximate estimate of γ from these two data sources. One covariate z distinguishes between whether information comes from the top line (z = 0) or from the bottom line (z = 1) of the alternative likelihood factorization (Equation 1). In the transmission part (top line), the outcome y equals 1 if, for a heterozygous parent, allele 2 is transmitted to the affected offspring, or 0 if allele 1 is transmitted. In case of two heterozygous parents, one transmission can be added to the dataset for each heterozygous parent. In the second part, the outcome y distinguishes between parent of a case (y = 1) or control (y = 0). The covariate x takes values 0, 0.5, and 1 for genotypes 1/1, 1/2, and 2/2, respectively (Table 1). The estimated coefficient of x in (Equation 1) gives an estimate of γ (effect estimate), the relative risk of having the disease with genotype 1/2 relative to 1/1 genotype. For motivation and details we refer to [3]. Note that the casecontrol study and the TDT can also be analyzed separately within this framework by selecting only z = 0 or z = 1 and omitting the covariate z (for the TDT, also the constant α has to be removed because of lack of identifiability).
For two affected offspring in a nuclear family, transmissions from the same heterozygous parent to their offspring are no longer independent, conditional on both offspring being affected. To deal with the dependencies caused by multiple affected offspring, we used the GEE (generalized estimating equations) [5] extension of logistic regression, both for the combined and for the separate casecontrol and TDT analyses.
Data used
A preliminary linkage study using microsatellites showed evidence for linkage in a region on chromosome 3, in replicate 1 of the Aipotu nuclear family data in a region ranging from D03S0123 to D03S0127. Based on these findings, we purchased packages 148 through 153. All SNPs in these packages were used, again for replicate 1 of the Aipotu population. We report only on the last six SNPs from package 153, because these gave the clearest evidence for association based on the separate analyses (casecontrol and TDT).
As outcome we used the Kofendrerd Personality Disorder (KPD). The 100 nuclear families contained 2 (78%), 3 (16%), 4 (3%), 5 (2%), or 7 (1%) affected offspring, for a total of 233 cases. All fifty independent controls from the same data subset (replicate 1 of the Aipotu population) were also used.
The R package [6] and the geepack library was used for the GEE logistic regression analysis.
Results
Table 2 shows the results from the casecontrol study, using all affected offspring from the nuclear families as cases. The standard errors are rather large because of the modest number of controls available in the casecontrol study.
Table 3 shows estimates (SE) from the TDT only (using logistic regression and GEE) (i.e., using the top two lines of Table 1 only), as well as from the combined analysis. Clearly, the standard errors of the estimates are reduced, on average, by about 5 to 10%. The gain in precision is reasonable, given the small number of controls used here. The other SNPs showed similar patterns (modest gains in the precision of the effect estimates in the combined analysis, compared to TDT only; results not shown).
Discussion
The assumptions underlying our approach are essentially those that underlie either of the two constituent elements of the analysis, namely the TDT and the casecontrol study. In general the assumptions that underlie the casecontrol data, such as comparability of cases and controls and absence of population stratification, are far more stringent than those underlying the TDT. One would therefore need to verify the assumptions underlying the casecontrol part of the study, before the two parts can be combined. Work on testing these assumptions, notably absence of population stratifications has been published [7]. A recent paper by Epstein et al. [8] discusses a formal test of the poolability of the two designs.
It is likely that such hybrid forms of casecontrol and TDT designs will become more frequent in the future. The method by Nagelkerke et al. [3] is straightforward to implement, and led, in general, to increased precision of the estimate of relative risk, compared to either design separately. Standard errors of the estimates reduced by about 5 to 10%, compared to a TDT only design. With a larger number of controls, the increase in precision is likely to be larger.
Arguably the most important advantage of the present approach is that it can be implemented in any statistical package. Moreover, embedding the analysis in a generalized linear modelling framework has the benefit of diagnostic tools and the possibility of incorporating covariates into the analysis.
Our objective in this paper was very modest: to illustrate a novel method for combining evidence for association from casecontrol and TDT designs in a single simple analysis. The results presented in this paper are certainly promising in this particular dataset (a single replicate from a single population from the simulated Genetic Analysis Workshop 14 data). We did not determine whether the proposed method is useful or cost effective in any particular situation. Extensive simulation studies (see Epstein et al. [8] for a power comparison between combined analysis of casecontrol and TDT and separate analyses showing a gain of power of the combined analysis as compared to either of the separate analyses) will be necessary in order to do that.
Conclusion
Both the casecontrol and the TDT analyses already showed association of SNPs B03T3056 and B03T3057 with KPD. The TDT design yielded considerably smaller standard errors than the casecontrol design. Combining the evidence from the casecontrol and TDT studies yielded a further 5–10% reduction in the standard errors of the effect estimates, compared to the TDTonly design.
Abbreviations
 GEE:

Generalized estimating equations
 KPD:

Kofendrerd Personality Disorder
 SNP:

Singlenucleotide polymorphism
 TDT:

Transmission disequilibrium test
References
 1.
Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium – the insulin gene region and insulindependent diabetesmellitus (IDDM). Am J Hum Genet. 1993, 52: 506516.
 2.
Uh HW, HouwingDuistermaat JJ, Putter H, van Houwelingen JC: How to quantify information loss due to phase ambiguity in haplotype casecontrol studies. BMC Genet. 6 (Suppl 1): S10810.1186/147121566S1S108.
 3.
Nagelkerke NJD, Kinman TG, Hoebee B, Teunis P: Combining the transmission disequilibrium test and casecontrol methodology using generalized logistic regression. Eur J Hum Genet. 2004, 12: 964970. 10.1038/sj.ejhg.5201255.
 4.
Abel L, MullerMyhsok B: Maximumlikelihood expression of the transmission/disequilibrium test and power considerations. Am J Hum Genet. 1998, 63: 664667. 10.1086/301975.
 5.
Liang KY, Zeger SL: Longitudinal dataanalysis using generalized linearmodels. Biometrika. 1987, 73: 1322. 10.2307/2336267.
 6.
R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, ISBN 3900051003
 7.
Pritchard JK, Rosenberg NA: Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet. 1999, 65: 220228. 10.1086/302449.
 8.
Epstein MP, Veal CD, Trembath RC, Barker JN, Li C, Satten GA: Genetic association analysis using data from triads and unrelated subjects. Am J Hum Genet. 2005, 76: 592608. 10.1086/429225.
Author information
Additional information
Authors' contributions
HP performed the analyses and wrote the manuscript. All authors participated in the development of the methods and in the interpretation of the results of the analysis. All authors read and approved the final manuscript.
Rights and permissions
About this article
Cite this article
Putter, H., HouwingDuistermaat, J.J. & Nagelkerke, N.J. Combining evidence for association from transmission disequilibrium and casecontrol studies using singlenucleotide polymorphisms. BMC Genet 6, S106 (2005) doi:10.1186/147121566S1S106
Published
DOI
Keywords
 Nuclear Family
 Population Stratification
 Transmission Disequilibrium Test
 Genetic Analysis Workshop
 Simple Logistic Regression