Volume 6 Supplement 1
Genetic Analysis Workshop 14: Microsatellite and singlenucleotide polymorphism
Combining evidence for association from transmission disequilibrium and casecontrol studies using singlenucleotide polymorphisms
 Hein Putter^{1}Email author,
 Jeanine J HouwingDuistermaat^{1} and
 Nico JD Nagelkerke^{2}
DOI: 10.1186/147121566S1S106
© Putter et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Abstract
The aim of the present analysis is to combine evidence for association from the two most commonly used designs in genetic association analysis, the casecontrol design and the transmission disequilibrium test (TDT) design. The cases here are affected offspring from nuclear families and are used in both the casecontrol and TDT designs. As a result, inference from these designs is not independent. We applied a simple logistic regression method for combining evidence for association from casecontrol and TDT designs to singlenucleotide polymorphism data purchased on a region on chromosome 3, replicate 1 of the Aipotu population. Combining the evidence from the casecontrol and TDT designs yielded a 5–10% reduction in the standard errors of the relative risk estimates. The authors did not know the results before the analyses were conducted.
Background
To establish allelic association between singlenucleotide polymorphisms (SNPs) and a disease, broadly speaking, two types of designs dominate. The first is the classical casecontrol study, where the frequency of a certain allele is compared between cases and controls. The other is the transmission disequilibrium test (TDT) [1]. The TDT is a familybased method for linkage and association that is, unlike the casecontrol study, not sensitive to possible population stratification. The TDT and the casecontrol studies have essentially the same objective, namely either to identify polymorphisms (alleles) that are causally related to a phenotypic trait, or to identify polymorphisms in high linkage disequilibrium to such a causal allele. The methods only differ in methodology; the TDT looks for such alleles through associations within families whereas casecontrol studies do so by identifying associations within populations. For the TDT, triads consisting of parents and an affected child are needed, which may be hard to obtain. In such a situation, combining evidence for association from TDT and casecontrol designs may be helpful.
Such a mixture of TDT and casecontrol designs can occur in a number of ways. To name just a few possibilities: 1) a TDT study was originally designed, and controls were subsequently added to increase power, or linkage was found in nuclear families, and these data were combined with controls for a casecontrol analysis; 2) a casecontrol study was originally designed, and a TDT study was then set up to confirm findings, or parents of cases were later genotyped in a haplotype study in order to gain phase information [2].
Results from the separate designs are not independent, because the same cases are used in the casecontrol and TDT design.
In the Genetic Analysis Workshop 14, data are available on nuclear families and a modest number of controls. Ideally, one would like to combine these sources of data as efficiently as possible. In a paper by Nagelkerke et al. [3], it is shown how this can be done using simple logistic regression.
Methods
Statistical analysis
Consider a SNP with alleles 1 and 2. Suppose that allele 2 is the high risk allele, and that 1 is the reference allele. We assume an additive model, where the relative risks of disease of a 1/2 heterozygote and a 2/2 homozygote with respect to a 1/1 homozygote equal γ and γ^{2}, respectively. The parameter γ is our parameter of interest; in what follows we refer to γ as the effect parameter and to estimates of γ as effect estimates. Let p be the frequency in the population of the high risk (allele 2) allele. We consider first one affected individual per nuclear family and show later how to adapt the analysis in case of multiple affected subjects. The likelihood of p and γ is given by
∏ P(genotypes of triplets  offspring affected; p, γ)
×
∏ P(genotypes of controls  p),
the first term corresponding to the TDT design, the second corresponding to the controls.
For a TDT family, let G_{ o }and G_{ p }denote the genotypes of offspring and parents, respectively, and let "case" denote the event that the offspring is affected. The likelihood contribution of a TDTfamily is given by
P(G_{ p }, G_{ o } "case") = P(G_{ o } G_{ p }, "case"; p, γ)·P(G_{ p } "case"; p, γ).
The first factor deals with transmission of alleles from parents to offspring, i.e., the TDT in its likelihood formulation [4]. The second factor essentially regains the information that was lost by using the TDT instead of the maximum likelihood estimator [3]. The complete likelihood can thus be factorized alternatively as
∏ P(G_{ o } G_{ p }, offspring affected; p, γ)
× (1)
∏ P(G_{ p } offspring affected; p, γ)·P(G_{ c } p),
where G_{ c }denotes the genotypes of controls. Nagelkerke et al. [3] then show that a single logistic regression with outcome y and two covariates x and z, given by
logit (pr(y = 1)) = exp(α + β z + γ x) (2)
Summary of data preparation for the logistic regression of equation (1) with outcome y and covariates x and z
y  x  z  Comments 

1  1  0  TDT, heterozygous parent, allele 2 transmitted 
0  1  0  TDT, heterozygous parent, allele 1 transmitted 
1  i/2  1  Parent of case, i copies of allele 2 
0  i/2  1  Control, i copies of allele 2 
For two affected offspring in a nuclear family, transmissions from the same heterozygous parent to their offspring are no longer independent, conditional on both offspring being affected. To deal with the dependencies caused by multiple affected offspring, we used the GEE (generalized estimating equations) [5] extension of logistic regression, both for the combined and for the separate casecontrol and TDT analyses.
Data used
A preliminary linkage study using microsatellites showed evidence for linkage in a region on chromosome 3, in replicate 1 of the Aipotu nuclear family data in a region ranging from D03S0123 to D03S0127. Based on these findings, we purchased packages 148 through 153. All SNPs in these packages were used, again for replicate 1 of the Aipotu population. We report only on the last six SNPs from package 153, because these gave the clearest evidence for association based on the separate analyses (casecontrol and TDT).
As outcome we used the Kofendrerd Personality Disorder (KPD). The 100 nuclear families contained 2 (78%), 3 (16%), 4 (3%), 5 (2%), or 7 (1%) affected offspring, for a total of 233 cases. All fifty independent controls from the same data subset (replicate 1 of the Aipotu population) were also used.
The R package [6] and the geepack library was used for the GEE logistic regression analysis.
Results
Results from the casecontrol analysis
SNP  γ  SE  z  P 

B03T3055  0.345  0.449  0.768  0.44 
B03T3056  2.900  0.573  5.061  4.20 × 10^{7} 
B03T3057  1.994  0.590  3.380  7.30 × 10^{4} 
B03T3058  0.233  0.479  0.486  0.13 
C03R0281  0.146  0.446  0.327  0.74 
B03T3060  0.699  0.688  1.016  0.31 
Results from TDT only and from the combined analysis
TDT  Combined analysis  

SNP  γ  SE  z  P  γ  SE  z  p 
B03T3055  0.315  0.169  1.858  0.063  0.209  0.153  1.368  0.17 
B03T3056  1.114  0.192  5.797  6.70× 10^{9}  1.245  0.189  6.597  4.20 × 10^{11} 
B03T3057  0.535  0.163  3.288  0.001  0.62  0.156  3.962  7.40 × 10^{5} 
B03T3058  0.571  0.193  2.952  0.0032  0.467  0.162  2.89  0.0039 
C03R0281  0.355  0.174  2.047  0.041  0.284  0.157  1.806  0.071 
B03T3060  0.199  0.206  0.967  0.33  0.229  0.192  1.194  0.23 
Discussion
The assumptions underlying our approach are essentially those that underlie either of the two constituent elements of the analysis, namely the TDT and the casecontrol study. In general the assumptions that underlie the casecontrol data, such as comparability of cases and controls and absence of population stratification, are far more stringent than those underlying the TDT. One would therefore need to verify the assumptions underlying the casecontrol part of the study, before the two parts can be combined. Work on testing these assumptions, notably absence of population stratifications has been published [7]. A recent paper by Epstein et al. [8] discusses a formal test of the poolability of the two designs.
It is likely that such hybrid forms of casecontrol and TDT designs will become more frequent in the future. The method by Nagelkerke et al. [3] is straightforward to implement, and led, in general, to increased precision of the estimate of relative risk, compared to either design separately. Standard errors of the estimates reduced by about 5 to 10%, compared to a TDT only design. With a larger number of controls, the increase in precision is likely to be larger.
Arguably the most important advantage of the present approach is that it can be implemented in any statistical package. Moreover, embedding the analysis in a generalized linear modelling framework has the benefit of diagnostic tools and the possibility of incorporating covariates into the analysis.
Our objective in this paper was very modest: to illustrate a novel method for combining evidence for association from casecontrol and TDT designs in a single simple analysis. The results presented in this paper are certainly promising in this particular dataset (a single replicate from a single population from the simulated Genetic Analysis Workshop 14 data). We did not determine whether the proposed method is useful or cost effective in any particular situation. Extensive simulation studies (see Epstein et al. [8] for a power comparison between combined analysis of casecontrol and TDT and separate analyses showing a gain of power of the combined analysis as compared to either of the separate analyses) will be necessary in order to do that.
Conclusion
Both the casecontrol and the TDT analyses already showed association of SNPs B03T3056 and B03T3057 with KPD. The TDT design yielded considerably smaller standard errors than the casecontrol design. Combining the evidence from the casecontrol and TDT studies yielded a further 5–10% reduction in the standard errors of the effect estimates, compared to the TDTonly design.
Abbreviations
 GEE:

Generalized estimating equations
 KPD:

Kofendrerd Personality Disorder
 SNP:

Singlenucleotide polymorphism
 TDT:

Transmission disequilibrium test
Declarations
Authors’ Affiliations
References
 Spielman RS, McGinnis RE, Ewens WJ: Transmission test for linkage disequilibrium – the insulin gene region and insulindependent diabetesmellitus (IDDM). Am J Hum Genet. 1993, 52: 506516.PubMed CentralPubMedGoogle Scholar
 Uh HW, HouwingDuistermaat JJ, Putter H, van Houwelingen JC: How to quantify information loss due to phase ambiguity in haplotype casecontrol studies. BMC Genet. 6 (Suppl 1): S10810.1186/147121566S1S108.
 Nagelkerke NJD, Kinman TG, Hoebee B, Teunis P: Combining the transmission disequilibrium test and casecontrol methodology using generalized logistic regression. Eur J Hum Genet. 2004, 12: 964970. 10.1038/sj.ejhg.5201255.View ArticlePubMedGoogle Scholar
 Abel L, MullerMyhsok B: Maximumlikelihood expression of the transmission/disequilibrium test and power considerations. Am J Hum Genet. 1998, 63: 664667. 10.1086/301975.PubMed CentralView ArticlePubMedGoogle Scholar
 Liang KY, Zeger SL: Longitudinal dataanalysis using generalized linearmodels. Biometrika. 1987, 73: 1322. 10.2307/2336267.View ArticleGoogle Scholar
 R Development Core Team: R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing, ISBN 3900051003
 Pritchard JK, Rosenberg NA: Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet. 1999, 65: 220228. 10.1086/302449.PubMed CentralView ArticlePubMedGoogle Scholar
 Epstein MP, Veal CD, Trembath RC, Barker JN, Li C, Satten GA: Genetic association analysis using data from triads and unrelated subjects. Am J Hum Genet. 2005, 76: 592608. 10.1086/429225.PubMed CentralView ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.