Robust trend tests for genetic association in case-control studies using family data

Tian, Xin; Joo, Jungnam; Zheng, Gang; Lin, Jing-Ping

doi:10.1186/1471-2156-6-S1-S107

Volume 6 Supplement 1

Genetic Analysis Workshop 14: Microsatellite and single-nucleotide polymorphism

Proceedings
Open access
Published: 30 December 2005

Robust trend tests for genetic association in case-control studies using family data

Xin Tian¹,
Jungnam Joo¹,
Gang Zheng¹ &
…
Jing-Ping Lin¹

BMC Genetics volume 6, Article number: S107 (2005) Cite this article

2747 Accesses
7 Citations
Metrics details

Abstract

We studied a trend test for genetic association between disease and the number of risk alleles using case-control data. When the data are sampled from families, this trend test can be adjusted to take into account the correlations among family members in complex pedigrees. However, the test depends on the scores based on the underlying genetic model and thus it may have substantial loss of power when the model is misspecified. Since the mode of inheritance will be unknown for complex diseases, we have developed two robust trend tests for case-control studies using family data. These robust tests have relatively good power for a class of possible genetic models. The trend tests and robust trend tests were applied to a dataset of Genetic Analysis Workshop 14 from the Collaborative Study on the Genetics of Alcoholism.

Background

Testing for linkage disequilibrium or association provides a useful alternative to testing linkage for complex traits with relatively small genetic effects [1]. Among the tests for association between a candidate-gene and a disease within a case-control design, the Cochran-Armitage (CA) trend test [2, 3] is preferable to the allele-based test and the Pearson's chi-squared test [4–6]. In such studies, cases and controls are usually independent random samples. Genotypes on each individual at markers in or near candidate genes are observed. For a marker with two alleles, the CA trend test can be used to test a linear trend between the disease and the number of the high-risk alleles at this marker.

Recently, there has been an increasing interest in statistical methods that evaluate association between genetic markers and disease status using family-based data [7, 8]. This would allow data available from linkage studies to be efficiently used to test for association. Unlike the traditional case-control studies in which all individuals are unrelated, cases and controls drawn from family data are often correlated because these individuals are often biologically related. Consequently, the frequencies of the high-risk alleles at a marker locus will be increased among related individuals. This may affect the false positive rate (type I error) for the association test, compared to case-control design based on independent samples. Hence, any test of genetic association must account for the correlations among family members. Slager and Schaid [7] extended the original CA trend test to case-control studies with family data, in which they modeled the correlations among related cases or controls as functions of the probability of their marker alleles shared identically by descent (IBD). This method can be applied to complex family structures and it obtains different correlations for different types of relative pairs. Thus, it is more flexible than the method assuming a common correlation for each pair of relatives within a family. With this correlation adjusted, the resulting trend test in Slager and Schaid [7] is similar to the original one but it uses appropriate variance formulation. Note that this trend test uses different scores depending on assumptions of the underlying genetic models. In practice, because the genetic model is unknown for most, if not all, complex diseases, applying a trend test with one set of scores would result in loss of power if the genetic model is misspecified. Therefore, more robust tests have been proposed to protect against model uncertainty [9, 10].

In this paper we study the two robust trend tests, the maximum test (MAX) and maximin efficiency robust test (MERT), in case-control design applied to family data. These two robust tests account for the correlated individuals and do not rely on the assumption of any particular genetic model. The performance of the robust trend tests and the extended CA trend test is compared by a simulation study. These tests are illustrated using a Genetic Analysis Workshop 14 dataset from the Collaborative Study on the Genetics of Alcoholism (COGA).

Methods

The trend tests

Consider data for a case-control study of genetic association as in Table 1. Assume a marker with two alleles: N and M, where N is a normal allele and M is an allele with high risk. Denote genotypes as g₀ = NN, g₁ = NM, and g₂ = MM. Let the genotype frequencies for cases and controls to be p_jand q_j, j = 0, 1, 2, respectively, and . Hence, the null hypothesis of no association is to test p_j= q_jfor each j.

Table 1 The data in a case-control study

Full size table

Given the data, the CA trend test for association [4] between a disease and the marker is written as Z_x= U(x)/(Var[U(x)])^1/2, where , and x= (x₀, x₁, x₂)' is a set of increasing scores (weights) assigned to the three genotypes (g₀, g₁, g₂) a priori based on the underlying genetic model. Note that (x₀, x₁, x₂)' can be reparameterized as (0, x,1)' with 0 ≤ x ≤ 1. If cases and controls are from independent random samples, the counts (r₀, r₁, r₂) and (s₀, s₁, s₂) in Table 1 follow multinomial distributions mul(R; p₀, p₁, p₂) and mul(S_;q₀, q₁, q₂), respectively. Under the null hypothesis, it can be shown that , and Z_xasymptotically follows a standard normal distribution N(0, 1).

The null hypothesis H₀ is rejected in favor of the alternative that M is the high risk allele associated with disease when Z_x>z_1-α, where z_1-αis the upper 100(1 - α)^th percentile of N(0, 1). When it is not certain which allele is high-risk, H₀ is rejected when |Z_x| > z_1-α/2.

However, since for case-control studies drawn from family data, cases and controls within the same family may be biologically related, Slager and Schaid [7] proposed the following method for estimating the variance to account for correlations among related cases or controls. Let y_i= (y_{i 0}, y_{i 1}, y_{i 2})' be the genotype indicator vector for the i^th case, where y_ij= 1 for the i^th case with genotype g_jand y_ij= 0 otherwise, i = 1, ..., R. Similarly, we use z_jfor controls. Then r = (r₀, r₁, r₂)' = , and s = (s₀, s₁, s₂)' = . Furthermore, y_iand z_jfollow the multinomial distributions mul(1; p₀, p₁, p₂) and mul(1; q₀, q₁, q₂), respectively. Let φ = R/n. The test statistic U(x) can also be written as U(x) = x'[(l - φ) r - φ s]. Then,

where the variances and covariances can be calculated based on the multinomial distributions and IBD-sharing probabilities for pairs of the related individuals [7],

Robust trend tests when the genetic model is unknown

Because for most complex diseases the underlying genetic model is unknown, we consider two robust trend tests [9, 10], the MERT and the MAX in the case-control study, where the cases and controls may be related. Note that for the special case in which cases and controls are independent random samples, the tests have been studied by Friedlin et al. [10].

Suppose we have a family of trend test statistics Z_icorresponding to different genetic models. The first robust test, MERT, can be written as a linear combination of the two test statistics with minimum correlation ρ₀. Denoting these two tests as {Z_s, Z_t}, then MERT is written as Z_MERT=(Z_s+ Z_t)/{2(1 + ρ₀)}^1/2, which asymptotically follows a standard normal distribution. The second robust trend test, MAX, can be defined as Z_MAX=max(Z_s, Z_MERT, Z_t) for a one-sided test, and Z_MAX = max(|Z_s|, |Z_MERT|, |Z_t|) for a two-sided alternative, where Z_MERT is chosen as the "middle" test because it has equal correlations with Z_sand Z_t. MAX is more powerful than MERT when ρ₀ is small, and the two tests have similar power when the minimum correlation is relatively large (e.g., ρ₀ ≥ 0.75) [11].

For case-control studies drawn from family data, we can derive the correlations for the trend tests defined in the previous section. Let the variance-covariance matrix

. Then the correlation between any two test statistics can be obtained

where x₀ and x₁are two sets of scores used for two different genetic models.

To test for association between a marker and disease status, the optimal scores for the recessive, additive, and dominant models are x = 0, 1/2, and 1 in x = (0, x, 1)' [12]. Based on the prior scientific knowledge, other possible choices of genetic models can also be assumed, which leads to different trend tests. The correlation of any two tests can then be calculated to determine the pair of tests with minimum correlation, so the MERT test can be performed. To apply the MAX test, the critical value and the p-value are obtained from simulation.

The trend tests with multiple alleles

The above trend tests Z_x can be extended to test the association with a multiallelic marker in a case-control study [7]. For a marker with K different alleles, there are m = K(K + 1)/2 possible genotypes and we can obtain a case-control table with r_iand s_i, i = 1, ..., m, similar to Table 1. The trend test statistic can be written as a (K- 1) × 1 vector, U = U(X) = X' [(1-φ)r -φ s], where X is a m × (K - 1) matrix with the j^th column, x_j, as a score vector for the m genotypes corresponding to the j^th allele, and Var(U)= X'∑X can be obtained similarly as in the previous section to adjust for correlations among family members. To test the association with this marker, Slager and Schaid [7] proposed to use the statistic U'[Var(U)]^-1 U as it asymptotically follows a chi-squared distribution with (K - 1) degrees of freedom.

Here, we can apply MERT and MAX as alternatives to this chi-squared test. Corresponding to the j^th allele, the j^th element of U is U_j= x'_j[(1-φ)r-φ s], and we have = Var(U_j) = x'_j∑x_jand . Then the trend test for each allele, Z_j= U_j/σ_j, j = 1,..., (K - 1), and the correlation for any two tests can be obtained. Hence, for the family of trend tests, MERT and MAX can be used to test for association with a multi-allelic marker.

Results

A simulation study

To illustrate the robustness of the statistics, MERT, and MAX, and to compare their performance with individual trend tests for given models, we simulated the case-control datasets and computed the empirical powers for all the tests under three genetic models: the recessive, additive and dominant models.

The simulations were based on the assumptions that the disease prevalence K = 0.1 and the allele frequency p = 0.3 with 20,000 replications. To facilitate the calculation, each case-control dataset included 160 cases generated as 80 sib-pairs drawn from 80 different families, and 160 controls as unrelated random samples. It can be shown that the probabilities of 0, 1, 2, alleles shared IBD are 1/4, 1/2, and 1/4 for the sib-pairs when parents' genotype information was unknown. Assuming these IBD probabilities, the variance of the trend test was adjusted for the correlations among related cases. Let the genotype relative risks RR₁ = f₁/f₀ and RR₂ = f₂/f₀, where f₀, f₁, and f₂ are penetrances for genotypes g₀, g₁, and g₂. Thus, equivalently, the null hypothesis H₀ can be written as RR₁ = RR₂ = 1. The alternative hypothesis can be specified by varying RR₁ and RR₂.

Table 2 displays the empirical powers of the trend tests and the robust tests, MERT and MAX. The relative risks RR₁ and RR₂ were chosen so that a particular trend test had about 80% power for each given model. When the true underlying model was recessive inheritance and the corresponding optimal test Z_{(x = 0)}had power of 80%, the tests Z_{(x = 1/2)}and Z_{(x = 1)}only had power of 62% and 26%, respectively. However, the test Z_{(x = 0)}was underpowered when the true model was dominant or additive. Compared to these trend tests, the MERT and MAX tests had relatively good powers for all the three models.

Table 2 Empirical powers of trend tests and robust trend tests

Full size table

Application

The COGA data consist of 1,614 individuals from 143 families, with alcoholism diagnosis, microsatellite, and single-nucleotide polymorphism (SNP) marker information. The preliminary genome scan by linkage analysis using the microsatellite data suggested that ADH3 of chromosome 4 may be an alcoholism susceptibility gene. Without adjusting for family structure, a logistic regression with backward selection of SNPs from the Illumina dataset near the ADH genes indicated that SNP marker rs1037475 was a significant predictor. Here we applied the association tests to case-control data using the ALDX1 diagnosis of "affected" and "purely unaffected" status to define case status and genotypes for this SNP marker. Table 3 presents the data including cases from 143 families and controls from 111 families.

Table 3 A case-control dataset from the COGA study

Full size table

Results of trend tests for the data in Table 3 with or without adjusting for the family-based correlations are shown in Figure 1. For individuals from the same family, their shared alleles IBD probabilities were calculated using software GENEHUNTER [13], and the correlations and the adjusted variances of the test statistics were obtained. We then applied the two-sided trend tests under recessive, additive, and dominant models, corresponding to the scores x = 0, 1/2, and 1. The tests showed significant association under both the recessive and additive model assumption (Z_{(x = 0)}= 2.89, p = 0.004; Z_{(x = 1/2)}= 2.02, p = 0.043), but it failed to show any significant result assuming a dominant model (Z_{(x = 1)}= 0.40, p = 0.69). Note that after adjusting for the correlations among family members, standard errors were larger, resulting in smaller test statistics Z_xand thus larger p-values compared to the tests without adjusting for the correlations (see Figure 1).

Figure 1 also shows the trend test results depend on the scores x = (0, x, 1) for the underlying genetic models. The trend tests Z_xwith 0 ≤ x ≤ 1 correspond to different models, where the statistics Z_xabove the horizontal dotted line are significant. Due to the uncertainty about the mode of inheritance, different conclusions could be reached and using any single trend test may result in significant loss of power when the model is misspecified. Therefore, we also applied the two robust tests to these data. Given the tests for the recessive, additive, and dominant models, the pair-wise correlations were calculated as Corr(Z_{(x = 0)}, Z_{(x = 1)}) = 0.334, Corr(Z_{(x = 0)}, Z_{(x = 1/2)}) = 0.818, and Corr(Z_{(x = 1/2)}, Z_{(x = 1)}= 0.813. Then we obtained Z_MERT = (2.89 + 0.40)/{2(1 + 0.334)}^1/2 = 2.01 with p-value = 0.044. By simulations with 1,000,000 replications, the empirical p-value for Z_MAX = 2.89 was p = 0.009. In this example, because the correlation between the test statistics under the recessive and dominant models is small, MAX appears to be more powerful than MERT to detect associations between disease status and a marker. Both robust trend tests showed significant association between this SNP marker and alcoholism.

Conclusion

In this paper, we applied the trend tests of genetic association to case-control studies drawn from the COGA families. Although the significant results under the recessive, additive, and dominant models were similar for this example, the tests ignoring the correlations among family members would have yielded large false-positive rates and moreover, unadjusted tests would not be valid.

We have also studied two robust trend tests, MERT and MAX, for case-control studies with family data. When the genetic model is unknown, these robust tests based on a family of possible genetic models tend to be more conservative against model misspecification. Although we have focused on the examples and models for genetic association, these results hold generally for trend tests of association with correlated cases or controls when the exposure variables have some natural ordering.

Abbreviations

CA:: Cochran-Armitage
COGA:: Collaborative Study on the Genetics of Alcoholism
IBD:: Identical by descent
MAX:: Maximum test
MERT:: Maximin efficiency robust test
SNP:: Single-nucleotide polymorphism

References

Risch N, Merikangas K: The future of genetic studies of complex human diseases. Science. 1996, 273: 1516-1517. 10.1126/science.273.5281.1516.
Article CAS PubMed Google Scholar
Armitage P: Tests for linear trends in proportions and frequencies. Biometrics. 1955, 11: 375-386. 10.2307/3001775.
Article Google Scholar
Cochran WG: Some methods for strengthening the common chi-squared tests. Biometrics. 1954, 10: 417-451. 10.2307/3001616.
Article Google Scholar
Sasieni PD: From genotypes to genes: doubling the sample size. Biometrics. 1997, 53: 1253-1261. 10.2307/2533494.
Article CAS PubMed Google Scholar
Slager SL, Schaid DJ: Case-control studies of genetic markers: power and sample size approximations for Armitage's test for trend. Hum Hered. 2001, 52: 149-153. 10.1159/000053370.
Article CAS PubMed Google Scholar
Czika W, Weir BS: Properties of the multiallelic trend test. Biometrics. 2004, 60: 69-74. 10.1111/j.0006-341X.2004.00166.x.
Article PubMed Google Scholar
Slager SL, Schaid DJ: Evaluation of candidate genes in case-control studies: a statistical method to account for related subjects. Am J Hum Genet. 2001, 68: 1457-1462. 10.1086/320608.
Article PubMed Central CAS PubMed Google Scholar
Rabinowitz D, Laird NM: A unified approach to adjusting association tests for population admixture with arbitrary pedigree structure and arbitrary missing marker information. Hum Hered. 2000, 50: 211-223. 10.1159/000022918.
Article CAS PubMed Google Scholar
Gastwirth JL: The use of maximin efficiency robust tests in combining contingency tables and survival analysis. J Am Stat Assoc. 1985, 80: 380-384. 10.2307/2287901.
Article Google Scholar
Freidlin B, Zheng G, Li Z, Gastwirth JL: Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered. 2002, 53: 146-152. 10.1159/000064976.
Article CAS PubMed Google Scholar
Freidlin H, Podgor MJ, Gastwirth JL: Efficiency robust tests for survival or ordered categorical data. Biometrics. 1999, 55: 883-886. 10.1111/j.0006-341X.1999.00264.x.
Article CAS PubMed Google Scholar
Zheng G, Freidlin B, Gastwirth JL: Choice of scores in trend tests for case-control studies of candidate-gene associations. Biometrical J. 2003, 45: 335-348. 10.1002/bimj.200390016.
Article Google Scholar
Kruglyak L, Daly MJ, Reeve-Daly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. Am J Hum Genet. 1996, 58: 1347-1363.
PubMed Central CAS PubMed Google Scholar

Download references

Author information

Authors and Affiliations

Office of Biostatistics Research, National Heart, Lung and Blood Institute, 6701 Rockledge Dr., Bethesda, Maryland, 20892, USA
Xin Tian, Jungnam Joo, Gang Zheng & Jing-Ping Lin

Authors

Xin Tian
View author publications
You can also search for this author in PubMed Google Scholar
Jungnam Joo
View author publications
You can also search for this author in PubMed Google Scholar
Gang Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Jing-Ping Lin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xin Tian.

Additional information

Authors' contributions

XT involved in the design of the study and statistical analysis, and drafted the manuscript. JJ, GZ, and JPL participated in its design and performed the statistical analysis. All authors read and approved the final manuscript.

Rights and permissions

Open Access This article is published under license to BioMed Central Ltd. This is an Open Access article is distributed under the terms of the Creative Commons Attribution License ( https://creativecommons.org/licenses/by/2.0 ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Tian, X., Joo, J., Zheng, G. et al. Robust trend tests for genetic association in case-control studies using family data. BMC Genet 6 (Suppl 1), S107 (2005). https://doi.org/10.1186/1471-2156-6-S1-S107

Download citation

Published: 30 December 2005
DOI: https://doi.org/10.1186/1471-2156-6-S1-S107

Genetic Analysis Workshop 14: Microsatellite and single-nucleotide polymorphism

Robust trend tests for genetic association in case-control studies using family data

Abstract

Background