 Proceedings
 Open Access
 Published:
Robust trend tests for genetic association in casecontrol studies using family data
BMC Genetics volume 6, Article number: S107 (2005)
Abstract
We studied a trend test for genetic association between disease and the number of risk alleles using casecontrol data. When the data are sampled from families, this trend test can be adjusted to take into account the correlations among family members in complex pedigrees. However, the test depends on the scores based on the underlying genetic model and thus it may have substantial loss of power when the model is misspecified. Since the mode of inheritance will be unknown for complex diseases, we have developed two robust trend tests for casecontrol studies using family data. These robust tests have relatively good power for a class of possible genetic models. The trend tests and robust trend tests were applied to a dataset of Genetic Analysis Workshop 14 from the Collaborative Study on the Genetics of Alcoholism.
Background
Testing for linkage disequilibrium or association provides a useful alternative to testing linkage for complex traits with relatively small genetic effects [1]. Among the tests for association between a candidategene and a disease within a casecontrol design, the CochranArmitage (CA) trend test [2, 3] is preferable to the allelebased test and the Pearson's chisquared test [4–6]. In such studies, cases and controls are usually independent random samples. Genotypes on each individual at markers in or near candidate genes are observed. For a marker with two alleles, the CA trend test can be used to test a linear trend between the disease and the number of the highrisk alleles at this marker.
Recently, there has been an increasing interest in statistical methods that evaluate association between genetic markers and disease status using familybased data [7, 8]. This would allow data available from linkage studies to be efficiently used to test for association. Unlike the traditional casecontrol studies in which all individuals are unrelated, cases and controls drawn from family data are often correlated because these individuals are often biologically related. Consequently, the frequencies of the highrisk alleles at a marker locus will be increased among related individuals. This may affect the false positive rate (type I error) for the association test, compared to casecontrol design based on independent samples. Hence, any test of genetic association must account for the correlations among family members. Slager and Schaid [7] extended the original CA trend test to casecontrol studies with family data, in which they modeled the correlations among related cases or controls as functions of the probability of their marker alleles shared identically by descent (IBD). This method can be applied to complex family structures and it obtains different correlations for different types of relative pairs. Thus, it is more flexible than the method assuming a common correlation for each pair of relatives within a family. With this correlation adjusted, the resulting trend test in Slager and Schaid [7] is similar to the original one but it uses appropriate variance formulation. Note that this trend test uses different scores depending on assumptions of the underlying genetic models. In practice, because the genetic model is unknown for most, if not all, complex diseases, applying a trend test with one set of scores would result in loss of power if the genetic model is misspecified. Therefore, more robust tests have been proposed to protect against model uncertainty [9, 10].
In this paper we study the two robust trend tests, the maximum test (MAX) and maximin efficiency robust test (MERT), in casecontrol design applied to family data. These two robust tests account for the correlated individuals and do not rely on the assumption of any particular genetic model. The performance of the robust trend tests and the extended CA trend test is compared by a simulation study. These tests are illustrated using a Genetic Analysis Workshop 14 dataset from the Collaborative Study on the Genetics of Alcoholism (COGA).
Methods
The trend tests
Consider data for a casecontrol study of genetic association as in Table 1. Assume a marker with two alleles: N and M, where N is a normal allele and M is an allele with high risk. Denote genotypes as g_{0} = NN, g_{1} = NM, and g_{2} = MM. Let the genotype frequencies for cases and controls to be p_{ j }and q_{ j }, j = 0, 1, 2, respectively, and . Hence, the null hypothesis of no association is to test p_{ j }= q_{ j }for each j.
Given the data, the CA trend test for association [4] between a disease and the marker is written as Z_{ x }= U(x)/(Var[U(x)])^{1/2}, where , and x= (x_{0}, x_{1}, x_{2})' is a set of increasing scores (weights) assigned to the three genotypes (g_{0}, g_{1}, g_{2}) a priori based on the underlying genetic model. Note that (x_{0}, x_{1}, x_{2})' can be reparameterized as (0, x,1)' with 0 ≤ x ≤ 1. If cases and controls are from independent random samples, the counts (r_{0}, r_{1}, r_{2}) and (s_{0}, s_{1}, s_{2}) in Table 1 follow multinomial distributions mul(R; p_{0}, p_{1}, p_{2}) and mul(S_{ ; }q_{0}, q_{1}, q_{2}), respectively. Under the null hypothesis, it can be shown that , and Z_{ x }asymptotically follows a standard normal distribution N(0, 1).
The null hypothesis H_{0} is rejected in favor of the alternative that M is the high risk allele associated with disease when Z_{ x }>z_{1α}, where z_{1α}is the upper 100(1  α)^{th} percentile of N(0, 1). When it is not certain which allele is highrisk, H_{0} is rejected when Z_{x} > z_{1α/2}.
However, since for casecontrol studies drawn from family data, cases and controls within the same family may be biologically related, Slager and Schaid [7] proposed the following method for estimating the variance to account for correlations among related cases or controls. Let y_{ i }= (y_{i 0}, y_{i 1}, y_{i 2})' be the genotype indicator vector for the i^{th} case, where y_{ ij }= 1 for the i^{th} case with genotype g_{ j }and y_{ ij }= 0 otherwise, i = 1, ..., R. Similarly, we use z_{ j }for controls. Then r = (r_{0}, r_{1}, r_{2})' = , and s = (s_{0}, s_{1}, s_{2})' = . Furthermore, y_{ i }and z_{ j }follow the multinomial distributions mul(1; p_{0}, p_{1}, p_{2}) and mul(1; q_{0}, q_{1}, q_{2}), respectively. Let φ = R/n. The test statistic U(x) can also be written as U(x) = x'[(l  φ) r  φ s]. Then,
where the variances and covariances can be calculated based on the multinomial distributions and IBDsharing probabilities for pairs of the related individuals [7],
Robust trend tests when the genetic model is unknown
Because for most complex diseases the underlying genetic model is unknown, we consider two robust trend tests [9, 10], the MERT and the MAX in the casecontrol study, where the cases and controls may be related. Note that for the special case in which cases and controls are independent random samples, the tests have been studied by Friedlin et al. [10].
Suppose we have a family of trend test statistics Z_{ i }corresponding to different genetic models. The first robust test, MERT, can be written as a linear combination of the two test statistics with minimum correlation ρ_{0}. Denoting these two tests as {Z_{ s }, Z_{ t }}, then MERT is written as Z_{MERT}=(Z_{ s }+ Z_{ t })/{2(1 + ρ_{0})}^{1/2}, which asymptotically follows a standard normal distribution. The second robust trend test, MAX, can be defined as Z_{MAX}=max(Z_{ s }, Z_{MERT}, Z_{ t }) for a onesided test, and Z_{MAX} = max(Z_{ s }, Z_{MERT}, Z_{ t }) for a twosided alternative, where Z_{MERT} is chosen as the "middle" test because it has equal correlations with Z_{ s }and Z_{ t }. MAX is more powerful than MERT when ρ_{0} is small, and the two tests have similar power when the minimum correlation is relatively large (e.g., ρ_{0} ≥ 0.75) [11].
For casecontrol studies drawn from family data, we can derive the correlations for the trend tests defined in the previous section. Let the variancecovariance matrix
. Then the correlation between any two test statistics can be obtained
where x_{0} and x_{1}are two sets of scores used for two different genetic models.
To test for association between a marker and disease status, the optimal scores for the recessive, additive, and dominant models are x = 0, 1/2, and 1 in x = (0, x, 1)' [12]. Based on the prior scientific knowledge, other possible choices of genetic models can also be assumed, which leads to different trend tests. The correlation of any two tests can then be calculated to determine the pair of tests with minimum correlation, so the MERT test can be performed. To apply the MAX test, the critical value and the pvalue are obtained from simulation.
The trend tests with multiple alleles
The above trend tests Z_{x} can be extended to test the association with a multiallelic marker in a casecontrol study [7]. For a marker with K different alleles, there are m = K(K + 1)/2 possible genotypes and we can obtain a casecontrol table with r_{ i }and s_{ i }, i = 1, ..., m, similar to Table 1. The trend test statistic can be written as a (K 1) × 1 vector, U = U(X) = X' [(1φ)r φ s], where X is a m × (K  1) matrix with the j^{th} column, x_{ j }, as a score vector for the m genotypes corresponding to the j^{th} allele, and Var(U)= X'∑X can be obtained similarly as in the previous section to adjust for correlations among family members. To test the association with this marker, Slager and Schaid [7] proposed to use the statistic U'[Var(U)]^{1} U as it asymptotically follows a chisquared distribution with (K  1) degrees of freedom.
Here, we can apply MERT and MAX as alternatives to this chisquared test. Corresponding to the j^{th} allele, the j^{th} element of U is U_{ j }= x'_{ j }[(1φ)rφ s], and we have = Var(U_{ j }) = x'_{ j }∑x_{ j }and . Then the trend test for each allele, Z_{ j }= U_{ j }/σ_{ j }, j = 1,..., (K  1), and the correlation for any two tests can be obtained. Hence, for the family of trend tests, MERT and MAX can be used to test for association with a multiallelic marker.
Results
A simulation study
To illustrate the robustness of the statistics, MERT, and MAX, and to compare their performance with individual trend tests for given models, we simulated the casecontrol datasets and computed the empirical powers for all the tests under three genetic models: the recessive, additive and dominant models.
The simulations were based on the assumptions that the disease prevalence K = 0.1 and the allele frequency p = 0.3 with 20,000 replications. To facilitate the calculation, each casecontrol dataset included 160 cases generated as 80 sibpairs drawn from 80 different families, and 160 controls as unrelated random samples. It can be shown that the probabilities of 0, 1, 2, alleles shared IBD are 1/4, 1/2, and 1/4 for the sibpairs when parents' genotype information was unknown. Assuming these IBD probabilities, the variance of the trend test was adjusted for the correlations among related cases. Let the genotype relative risks RR_{1} = f_{1}/f_{0} and RR_{2} = f_{2}/f_{0}, where f_{0}, f_{1}, and f_{2} are penetrances for genotypes g_{0}, g_{1}, and g_{2}. Thus, equivalently, the null hypothesis H_{0} can be written as RR_{1} = RR_{2} = 1. The alternative hypothesis can be specified by varying RR_{1} and RR_{2}.
Table 2 displays the empirical powers of the trend tests and the robust tests, MERT and MAX. The relative risks RR_{1} and RR_{2} were chosen so that a particular trend test had about 80% power for each given model. When the true underlying model was recessive inheritance and the corresponding optimal test Z_{(x = 0)}had power of 80%, the tests Z_{(x = 1/2)}and Z_{(x = 1)}only had power of 62% and 26%, respectively. However, the test Z_{(x = 0)}was underpowered when the true model was dominant or additive. Compared to these trend tests, the MERT and MAX tests had relatively good powers for all the three models.
Application
The COGA data consist of 1,614 individuals from 143 families, with alcoholism diagnosis, microsatellite, and singlenucleotide polymorphism (SNP) marker information. The preliminary genome scan by linkage analysis using the microsatellite data suggested that ADH3 of chromosome 4 may be an alcoholism susceptibility gene. Without adjusting for family structure, a logistic regression with backward selection of SNPs from the Illumina dataset near the ADH genes indicated that SNP marker rs1037475 was a significant predictor. Here we applied the association tests to casecontrol data using the ALDX1 diagnosis of "affected" and "purely unaffected" status to define case status and genotypes for this SNP marker. Table 3 presents the data including cases from 143 families and controls from 111 families.
Results of trend tests for the data in Table 3 with or without adjusting for the familybased correlations are shown in Figure 1. For individuals from the same family, their shared alleles IBD probabilities were calculated using software GENEHUNTER [13], and the correlations and the adjusted variances of the test statistics were obtained. We then applied the twosided trend tests under recessive, additive, and dominant models, corresponding to the scores x = 0, 1/2, and 1. The tests showed significant association under both the recessive and additive model assumption (Z_{(x = 0)}= 2.89, p = 0.004; Z_{(x = 1/2)}= 2.02, p = 0.043), but it failed to show any significant result assuming a dominant model (Z_{(x = 1)}= 0.40, p = 0.69). Note that after adjusting for the correlations among family members, standard errors were larger, resulting in smaller test statistics Z_{ x }and thus larger pvalues compared to the tests without adjusting for the correlations (see Figure 1).
Figure 1 also shows the trend test results depend on the scores x = (0, x, 1) for the underlying genetic models. The trend tests Z_{ x }with 0 ≤ x ≤ 1 correspond to different models, where the statistics Z_{ x }above the horizontal dotted line are significant. Due to the uncertainty about the mode of inheritance, different conclusions could be reached and using any single trend test may result in significant loss of power when the model is misspecified. Therefore, we also applied the two robust tests to these data. Given the tests for the recessive, additive, and dominant models, the pairwise correlations were calculated as Corr(Z_{(x = 0)}, Z_{(x = 1)}) = 0.334, Corr(Z_{(x = 0)}, Z_{(x = 1/2)}) = 0.818, and Corr(Z_{(x = 1/2)}, Z_{(x = 1)}= 0.813. Then we obtained Z_{MERT} = (2.89 + 0.40)/{2(1 + 0.334)}^{1/2} = 2.01 with pvalue = 0.044. By simulations with 1,000,000 replications, the empirical pvalue for Z_{MAX} = 2.89 was p = 0.009. In this example, because the correlation between the test statistics under the recessive and dominant models is small, MAX appears to be more powerful than MERT to detect associations between disease status and a marker. Both robust trend tests showed significant association between this SNP marker and alcoholism.
Conclusion
In this paper, we applied the trend tests of genetic association to casecontrol studies drawn from the COGA families. Although the significant results under the recessive, additive, and dominant models were similar for this example, the tests ignoring the correlations among family members would have yielded large falsepositive rates and moreover, unadjusted tests would not be valid.
We have also studied two robust trend tests, MERT and MAX, for casecontrol studies with family data. When the genetic model is unknown, these robust tests based on a family of possible genetic models tend to be more conservative against model misspecification. Although we have focused on the examples and models for genetic association, these results hold generally for trend tests of association with correlated cases or controls when the exposure variables have some natural ordering.
Abbreviations
 CA:

CochranArmitage
 COGA:

Collaborative Study on the Genetics of Alcoholism
 IBD:

Identical by descent
 MAX:

Maximum test
 MERT:

Maximin efficiency robust test
 SNP:

Singlenucleotide polymorphism
References
 1.
Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science. 1996, 273: 15161517. 10.1126/science.273.5281.1516.
 2.
Armitage P: Tests for linear trends in proportions and frequencies. Biometrics. 1955, 11: 375386. 10.2307/3001775.
 3.
Cochran WG: Some methods for strengthening the common chisquared tests. Biometrics. 1954, 10: 417451. 10.2307/3001616.
 4.
Sasieni PD: From genotypes to genes: doubling the sample size. Biometrics. 1997, 53: 12531261. 10.2307/2533494.
 5.
Slager SL, Schaid DJ: Casecontrol studies of genetic markers: power and sample size approximations for Armitage's test for trend. Hum Hered. 2001, 52: 149153. 10.1159/000053370.
 6.
Czika W, Weir BS: Properties of the multiallelic trend test. Biometrics. 2004, 60: 6974. 10.1111/j.0006341X.2004.00166.x.
 7.
Slager SL, Schaid DJ: Evaluation of candidate genes in casecontrol studies: a statistical method to account for related subjects. Am J Hum Genet. 2001, 68: 14571462. 10.1086/320608.
 8.
Rabinowitz D, Laird NM: A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered. 2000, 50: 211223. 10.1159/000022918.
 9.
Gastwirth JL: The use of maximin efficiency robust tests in combining contingency tables and survival analysis. J Am Stat Assoc. 1985, 80: 380384. 10.2307/2287901.
 10.
Freidlin B, Zheng G, Li Z, Gastwirth JL: Trend tests for casecontrol studies of genetic markers: power, sample size and robustness. Hum Hered. 2002, 53: 146152. 10.1159/000064976.
 11.
Freidlin H, Podgor MJ, Gastwirth JL: Efficiency robust tests for survival or ordered categorical data. Biometrics. 1999, 55: 883886. 10.1111/j.0006341X.1999.00264.x.
 12.
Zheng G, Freidlin B, Gastwirth JL: Choice of scores in trend tests for casecontrol studies of candidategene associations. Biometrical J. 2003, 45: 335348. 10.1002/bimj.200390016.
 13.
Kruglyak L, Daly MJ, ReeveDaly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996, 58: 13471363.
Author information
Additional information
Authors' contributions
XT involved in the design of the study and statistical analysis, and drafted the manuscript. JJ, GZ, and JPL participated in its design and performed the statistical analysis. All authors read and approved the final manuscript.
Rights and permissions
About this article
Cite this article
Tian, X., Joo, J., Zheng, G. et al. Robust trend tests for genetic association in casecontrol studies using family data. BMC Genet 6, S107 (2005) doi:10.1186/147121566S1S107
Published
DOI
Keywords
 Genetic Model
 Dominant Model
 Trend Test
 Family Data
 Robust Test