Volume 6 Supplement 1
Genetic Analysis Workshop 14: Microsatellite and singlenucleotide polymorphism
Analysis of binary traits: testing association in the presence of linkage
 Gudrun Jonasdottir^{1, 2}Email author,
 Juni Palmgren^{1, 2} and
 Keith Humphreys^{2}
DOI: 10.1186/147121566S1S92
© Jonasdottir et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Abstract
Most methods for testing association in the presence of linkage, using familybased studies, have been developed for continuous traits. FBAT (familybased association tests) is one of few methods appropriate for discrete outcomes. In this article we describe a new test of association in the presence of linkage for binary traits. We use a gamma random effects model in which association and linkage are modelled as fixed effects and random effects, respectively. We have compared the gamma random effects model to an FBAT and a generalized estimating equationbased alternative, using two regions in the Genetic Analysis Workshop 14 simulated data. One of these regions contained haplotypes associated with disease, and the other did not.
Background
Testing association in a region with confirmed linkage may increase the rate of false positives in familybased studies. In a linked region one expects similarity between related individuals. If unaccounted for, this similarity may be mistaken for association. Different remedies have been suggested, ranging from using a robust variance estimator [1] for the general test statistic FBAT (familybased association tests) [2] to a modelbased approach in which the linkage is modelled in the covariance structure [3] (VCM, variance components model). The VCM has been developed for continuous traits, while FBAT can be used with both binary and continuous traits. In this article we concentrate on methods for testing association in the presence of linkage, using binary traits. We compare the program FBAT for binary traits to both the gamma random effects (GRE) method and also a GEE (generalized estimating equation) [4] approach. For the purpose of our comparisons we have used the simulated Genetic Analysis Workshop 14 (GAW14) data. We have compared the three methods' ability to pick up a signal in a region with association, as well as their ability to avoid signalling in a region with no association.
Methods
We consider a random effects model for binary events, which is similar in spirit to the multivariate survival model in Zhong and Li [5], which models association and linkage as fixed effects and random effects, respectively. We use a result for random effects models for binary outcomes, which has been described by Conaway [6]. It is shown that for gamma distributed random effects, the unconditional distribution of the outcome using a loglog link can be written as a sum of easily calculated terms. Analytical tractability is only achievable for a few other combinations of random effects distributions and link functions, such as the beta distribution with a log(log) link [6]. The random effects model in Zhong and Li [5] assigns one random effect for each of the two alleles of the father and one random effect for each of the two alleles of the mother. The notion of inheritance vector is used to describe the alleles for all family members jointly. The method presented here works for all sizes of sibships, and may also be easily adapted to extended pedigrees.
GRE model
Let (Y_{i 1}, Y_{i 2}, ..., ) be the binary trait vector for family i and let j denote offspring (j = 1, 2, ..., J_{ i }). We allow for different family sizes J_{ i }. We use θ_{ mj }and θ_{ pj }to denote the effect of the transmitted alleles to offspring j, with m_{ j }= 1, 2 the maternal alleles and p_{ j }= 3, 4 the paternal alleles, respectively. Conditional on the transmitted alleles, we write the probability of the trait for offspring j in family i as P(Y_{ ij }= 1θ_{ mj }, θ_{ pj }). We consider a model with a log(log) link of the form
log(log(P(Y_{ ij }= 1θ_{ mj }, θ_{ pj }))) = log(θ_{ mj }+ θ_{ pj }) + X_{ j }β,
or equivalently
The effects θ of the transmitted alleles act multiplicatively on the offspring trait probability, and the effect of each transmitted allele is multiplied by a term involving the parameter vector β describing the fixed genetic effects. Following Li [7] and Li and Zhong [8] we assume that the maternal and paternal alleles are independent and that each allele contributes an effect to the trait which is random and follows a gamma distribution with scale α/2 and shape λ. The model has a tractable closed form for the joint unconditional trait probabilities for the offsprings in a sibship. Let Ψ denote all ordered subsets of 1, 2, ..., J_{ i }, Ψ = {{0}, {1}, {2}, {1, 2}, {3}, ..., {1, 2, ..., J_{ i }}. Let denote the joint unconditional probability of Y_{ ij }= 1 for all j ∈ T, where T ∈ Ψ. Calculating the probability requires integrating over θ_{1}, θ_{2}, θ_{3} and θ_{4}. There is a tractable solution [6]. It turns out that
The elements of vector a_{ k }, a_{ jk }, indicate whether allele k has been transmitted to offspring j, j = 1, 2, ..., J_{ i }. The probabilities for all T ∈ Ψ can be placed in a vector π*. It has been shown [6] that the unconditional probability for all possible outcomes of Y can be written as π = Z^{ 1 }π*.
Matrices for J_{ i }= 3 offspring
T  Z matrix^{b}  π subscripts^{c}  

ø  1  1  1  1  1  1  1  1  1  1  1 
1  1  0  1  0  1  0  1  0  0  1  1 
2  1  1  0  0  1  1  0  0  1  0  1 
1, 2  1  0  0  0  1  0  0  0  0  0  1 
3  1  1  1  1  0  0  0  0  1  1  0 
1, 3  1  0  0  0  0  0  0  0  0  1  0 
2, 3  1  1  0  0  0  0  0  0  1  0  0 
1, 2, 3  1  0  0  0  0  0  0  0  0  0  0 
We used the statistical software R (version 1.9.1) [9] to implement the likelihood and maximize it with respect to the association parameter β.
We have so far not described how to deal with incompletely observed inheritance vectors. In the context of testing association in the presence of linkage, Zhong and Li [5] suggest using GENEHUNTER to obtain the distribution for inheritance vectors at any arbitrary point along the chromosome. In our singlepoint analysis we treat all inheritance vectors compatible with the data as equally likely and construct a weighted mean of π_{i}. We return to the choice of weights in the discussion.
FBAT and GEE
We compare the GRE with FBAT (version 1.5.1) [2] and a generalized estimating equation (GEE)based alternative [4]. For FBAT we assume a linear alleledose model, and for the GEEbased alternative we assume a linear alleledose on the logit scale and an exchangeable covariance structure.
We used FBAT option o to find the optimal weight. We then applied the optimal weight to the phenotype score and used FBAT option e to test our data. The function gee (in package gee) in R (version 1.9.1) was used for the GEE analysis. The gee package can be found at the R web page [9].
GAW14 simulated data
For details concerning how the simulation was performed see the GAW14 Data Description [10].
All analyses were performed with knowledge of the data simulation process. We chose to analyze the data with respect to trait A. Trait A is known to be associated with haplotypes in the Region D3, while markers in the D2 region are known to not be associated with trait A. For the purpose of our comparison we therefore chose to "purchase" markers in the D3 region (B05T4135–B05T4142) as well as markers from the D2 region (B03T3048–B03T3067). Our aim was to use regions D2 and D3 to gain some insight into the performance of the different methods. More specifically, we were not expecting a signal in region D2, but were hoping for one in region D3.
The Aipotu population (one of four simulated populations) only consists of nuclear families, although these are of different sizes. For simplicity, we chose to concentrate on the Aipotu population and to only include families of maximum size six (i.e., two parents and at most four offspring).
We merged 10 (out of 100) replicates in order to get a sample with reasonable power. This provided us with a total of 481 independent nuclear families. There was no missing data and we did not simulate any.
We selected the markers described above and analyzed each marker separately in a set of singlepoint analyses. The method we have described can, however, be extended to multiple markers and a multipoint analysis.
Results
Conclusion
In the simulated data, region D2 harbored no locus associated with trait A. All three methods (FBAT, GEE, and GEE) gave a signal for association with marker B03T3056 with a pvalue around 0.01. However, taking the multiple testing into account, this pvalue does not reach statistical significance. The results from all markers in the region are showed in Figure 1. Across the markers, no one method produced consistently higher or lower pvalues than any other method.
In region D3, association with trait A was simulated at the haplotype level. We still chose to perform singlepoint analyses with each marker in turn. The GEE and the GRE turn out to be slightly better in detecting significant markers than FBAT.
The GRE model presented here seems to work well, compared to both GEE and FBAT. It would be useful to perform simulation studies to assess validity and power of the three procedures under different genetic models. The GRE model requires more computational time, stemming from the fact that in spite of the closed form in (3) it is time consuming to evaluate and to maximize the likelihood.
A problem with the GRE model is how to handle the missing information on transmission. In our singlepoint algorithm we propose using a weighted sum (with equal weights) over all compatible inheritance vectors, given parental and offspring genotypes. Following Zhong and Li [5] we compute the distribution over inheritance vectors without attention to phenotype. However, given that linkage is assumed, the probabilities of transmission are not invariant to offspring phenotypes. It would be useful to investigate the impact of using our suboptimal weights on the GAW data, and more generally in comparing the validity and power of the different approaches using simulations under different genetic models.
Abbreviations
 FBAT:

Family based association tests
 GAW14:

Genetic Analysis Workshop 14
 GEE:

Generalized estimating equation
 GRE:

Gammar random effects
 VCM:

Variance components model
Authors’ Affiliations
References
 Lake SL, Blacker D, Laird NM: Familybased tests of association in the presence of linkage. Am J Hum Genet. 2000, 67: 15151525. 10.1086/316895.PubMed CentralView ArticlePubMedGoogle Scholar
 Rabinowitz D, Laird N: A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered. 2000, 50: 211223. 10.1159/000022918.View ArticlePubMedGoogle Scholar
 Fulker DW, Cherny SS, Sham PC, Hewitt JK: Combined linkage and association sibpair analysis for quantitative traits. Am J Hum Genet. 1999, 64: 259267. 10.1086/302193.PubMed CentralView ArticlePubMedGoogle Scholar
 Liang KY, Zeger SL: Longitudinal data analysis using generalized estimating equations. Biometrika. 1986, 73: 1322. 10.2307/2336267.View ArticleGoogle Scholar
 Zhong X, Li H: Score tests of genetic association in the presence of linkage based on the additive genetic gamma frailty model. Biostatistics. 2004, 5: 307327. 10.1093/biostatistics/5.2.307.View ArticlePubMedGoogle Scholar
 Conaway MR: A random effects model for binary trait. Biometrics. 1990, 46: 317328. 10.2307/2531437.View ArticleGoogle Scholar
 Li H: The additive genetic gamma frailty model for linkage analysis. Ann Hum Genet. 1999, 63: 455468. 10.1046/j.14691809.1999.6350455.x.View ArticlePubMedGoogle Scholar
 Li H, Zhong X: Multivariate survival models induced by genetic frailties, with application to linkage analysis. Biostatistics. 2002, 3: 5775. 10.1093/biostatistics/3.1.57.View ArticlePubMedGoogle Scholar
 The R Project for Statistical Computing. [http://www.rproject.org/]
 GAW14 Data Description. [http://www.gaworkshop.org/data.htm]
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.