 Methodology article
 Open Access
 Published:
A powerful scorebased test statistic for detecting genegene coassociation
BMC Geneticsvolume 17, Article number: 31 (2016)
Abstract
Background
The genetic variants identified by Genomewide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of genegene joint effects which contains the main effects and their coassociation is one of the possible explanations for the “missing heritability” problems. Genegene coassociation refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific diseaseassociated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel scorebased statistic (SBS) as a genebased method for detecting genegene coassociation.
Results
Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and coassociation levels, the proposed SBS has the better performance than other existed methods including single SNPbased and principle component analysis (PCA)based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and deltasquare (δ ^{2}) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice.
Conclusions
SBS is a powerful and efficient genebased method for detecting genegene coassociation.
Background
Genomewide association study (GWAS) has successfully identified numerous loci associated with complex disease or traits [1–3]. Despite high expectations, one common sense is that the genetic variants identified by GWAS can only account for a small proportion of the total heritability for complex disease, referring to “missing heritability” problem [4–6]. Possible explanations for this problem include the existence of genegene joint effects, the contribution of rare variation, underestimation of the effects of alleles identified, the possibility that inherited epigenetic factors lead to resemblance between relatives and possible overestimation of heritability of the interested complex disease or traits [4–7]. It is highly desirable to further develop more efficient statistical strategies to extract more information from the highthroughput data. Among these, one key but inadequately addressed issue is the joint effects of two genes, which contains the main effects and their coassociation.
Our group has proposed the concept of genegene coassociation which refers to the extent to which the joint effects of two genes differs from the main effects of each gene in previous studies [8–11]. The distinction between genegene coassociation and interaction has been theoretically clarified from the causal diagram perspective [9], and various simulations have also been conducted to confirm its reasonability, especially for two highly correlated genes. Specifically, taking 2 SNPs as an example (Fig. 1), the main effects of SNP1 and SNP2 are supposed to be β _{1} and β _{2} respectively and the correlation coefficient between them is r. The total effects of SNP1 and SNP2 are denoted as β _{1} + β _{2} + β _{3} + r(β _{1} + β _{2}) and the term β _{3} + r(β _{1} + β _{2}) represents the coassociation where the traditional interaction β _{3}is only one part of coassociation [9]. Actually, genegene coassociation is essentially used to capture the joint effects attributed to the correlation r(β _{1} + β _{2}), which has usually been neglected in traditional regression model. Generally, genes tend to work collaboratively within specific pathway or network that is associated with certain disease [12–15] and the diseaseassociated interacting locus will often be highly correlated (single nucleotide polymorphisms (SNPs) in linkage disequilibrium (LD)) [16]. In this context, genegene coassociation should be more appropriate to cope with the missing heritability problem. On the other hand, testing the coassociation of two genes can, to some extent, guide us to learn and construct genetic network structures. It is of great significance to develop methods for detecting genegene coassociation.
Recently, several methods have been proposed to test genegene coassociation, such as the statistics based on SNPlevel Fisher rtoz transformation [9], canonical correlation analysis (CCU) [8], kernel canonical correlation analysis (KCCU) [11] and partial least squares path modeling (PLSPM) [10]. SNPlevel Fisher rtoz transformationbased statistics, though having acceptable false positive rates and computation burden, fail to fully utilize the LD information between markers and true causal SNPs in one gene or region, leading to lower statistical power. Furthermore, single SNP can hardly represent the total effect of the whole gene on a disease. It is appealing to construct gene or regionbased statistics to detect genegene coassociation, such as the latter three statistics including CCU, KCCU and PLSPMbased statistics. However, CCU statistic [8] merely captures linear correlation which may be inappropriate for genomic data containing nonlinear structure, and it only utilizes the first canonical correlation coefficient, which may underestimate the genegene coassociation. Although KCCU statistic [11], as the nonlinear version of CCU, can detect the nonlinear information, it still remains the uncertainty to set the kernel function with appropriate parameters for each testing data leading to undesirable performance, as well as the high computational burden due to the use of bootstrap test. Similarly, PLSPMbased statistic [10] can deal with the problems of high multicollinearity between SNPs, but it is also timeconsuming resulting from the employment of random permutation test. Therefore, developing powerful and efficient genebased methods to test genegene coassociation is highly desirable.
At present study, we aimed to develop a powerful scorebased test statistic to identify coassociation at gene or region level, which essentially captured the effect of covariance matrix between two genes on disease. Various simulation studies were conducted to assess its type I error rate and power, comparing with the commonlyused single SNPbased logistic regression model (SNPLRT) [17–19], principle component analysis (PCA)based logistic regression model (PCALRT) [20], the deltasquare (δ ^{2}) statistic [16], the CCU statistic [8], the KCCU statistic [11] and the PLSPMbased statistic [10]. Finally, the proposed scorebased statistic (SBS) was applied to analyze a rheumatoid arthritis (RA) data from GAW16 Problem 1. Both simulation and real data analysis indicate that the proposed statistic has better performance than other existing methods.
Methods
Scorebased Statistic
We denote Y _{ i } as observed binary trait outcome of individual i(i = 1, 2, …, n) in the GWAS data set and let the genotype data be (X _{11}, X _{12}, …, X _{1k }, …, X _{1K }) for gene A with K SNPs and (X _{21}, X _{22}, …, X _{2j }, …, X _{2J }) for gene B with J SNPs. Particularly, for the k ^{th} loci of gene A and j ^{th} loci of gene B, we can firstly define the variability score for each sample by \( {u}_{kji}=\left({X}_{1ki}{\overline{X}}_{1k}\right)\left({X}_{2ji}{\overline{X}}_{2j}\right) \), where \( {\overline{X}}_{1k} \) and \( {\overline{X}}_{2j} \) indicate the mean level of k ^{th} loci of gene A and j ^{th} loci of gene B respectively. Then, the scorebased statistic for their coassociation effect can be defined as \( {u}_{kj}={\displaystyle {\sum}_{i=1}^n\left({Y}_i\overline{Y}\right)}\left({X}_{1ki}{\overline{X}}_{1k}\right)\left({X}_{2ji}{\overline{X}}_{2j}\right) \), where \( \overline{Y} \) is the sample mean of disease status. Furthermore, the score vector with the length of K*J can be defined as U = (u _{11}, u _{12}, …, u _{1K }, u _{21}, …, u _{2K }, …, u _{ kj }, …, u _{ K1}, …, u _{ KJ }), and covariance matrix for the score vector can be easily obtained as
Finally, the new scorebased statistic for detecting genegene coassociation can be constructed as SBS = UΣ ^{− 1} U ^{T}, which follows chisquare distribution with K*J degree freedom (χ _{ K ∗ J } ^{2} ) under the null hypothesis that there is no coassociation between these two genes.
Data simulation
Simulation studies were conducted to assess the type I error rate and power of the SBS comparing with other methods for testing genegene coassociation. We simulated three coassociation scenarios as follows: Type I coassociation (under nearly independent condition between gene A and gene B, i.e. the traditional interaction β _{3}), Type II coassociation (only caused by correlation between gene A and gene B, i.e. r(β _{1} + β _{2})),Type III coassociation (caused by both correlation and independent term A × B between gene A and gene B, i.e. β _{3} + r(β _{1} + β _{2})). Specifically, the null hypothesis for all three simulation scenarios can be described as inexistence of coassociation between two genes. Reference phased haplotype data was downloaded from the HapMap website (http://hapmap.ncbi.nlm.nih.gov/) [21]. Subsequently, a large CEU population of 100,000 individuals was obtained by gs2.0 [22, 23] under the additive genetic model. In all simulations, the causal SNPs were removed to assess the performances of the SBS. For each parameter setting, 1000 simulations were repeated with a significant level of 0.05 and N individuals were sampled from the whole 100,000 population randomly.
For scenario 1 (Type I coassociation), we chose 7 SNPs at Chr17:1650000215…1650011216 and 7 SNPs at Chr18:1700258917…1700276475. The casecontrol statuses were generated from a logistic regression model Logit(P) = β _{0} + β _{1} × SNP _{1} + β _{2} × SNP _{2} + β _{3} × (SNP _{1} × SNP _{2}), where SNP1 and SNP2, correlated with coefficient r were causal SNPs, and the 1^{st} SNP of gene A and 5^{th} SNP of gene B were defined as the causal SNPs. Three different main effects were set to make our simulations more practical, two marginal effects (β _{1} = log(1.3), β _{2} = log(1.5)), one marginal effect (β _{1} = 0, β _{2} = log(1.5)) and no marginal effects (β _{1} = β _{2} = 0). Different β _{3} were chosen to evaluate the type I error rate (r = 0, β _{3} = 0) under various sample sizes N (N/2 cases and N/2 controls, N = 400, …, 2000) and power (β _{3} was specified from log(1.1) to log(1.9) stepped by log(0.2)) under fixed sample size 1200. In addition, we also fixed the interaction odds ratio and main effects to assess the performance of the SBS under different sample sizes.
For scenario 2 (Type II coassociation), we chose 7 SNPs at Chr22:2126161008…2126164539 and 7 SNPs at Chr22:2126166075…2126177318. In this situation, the casecontrol statuses were generated from the logistic regression model Logit(P) = β _{0} + β _{1} × SNP _{1} + β _{2} × SNP _{2}. Different r were specified to evaluate the type I error rate (β _{1} = β _{2} = β _{3} = 0, r = 0.1, 0.2, 0.3, 0.4, 0.5, 0.9) and power under fixed main effects β _{1} = 0, β _{2} = log(1.5) and β _{1} = log(1.3), β _{2} = log(1.5) for the two causal SNPs with given sample size 1200. To evaluate the performance under different MAF of causal SNP pairs, different correlation structures between two causal SNPs were chosen from the two regions.
For scenario 3 (Type III coassociation), we selected the same gene region as in the scenario 2. The casecontrol statuses were generated from the model Logit(P) = β _{0} + β _{1} × SNP _{1} + β _{2} × SNP _{2} + β _{3} × (SNP _{1} × SNP _{2}). Two situations were considered: β _{3} was specified from log(1.1) to log(1.9) stepped by log(0.2) under fixed r, and r was set from 0.1to 0.5 by 0.1under fixed β _{3}. All the simulations were conducted under sample size 1200 and different main effect patterns (β _{1} = β _{2} = 0, β _{1} = 0, β _{2} = log(1.5) and β _{1} = log(1.3), β _{2} = log(1.5)).
For the single SNPbased logistic regression model,we considered each pairwise interaction separately, and selected the most significant one (smallest pvalues). Significane levels were assessed using permutations to adjust the multiple testing [10].
Applications
The SBS was also applied to a GWAS of North American Rheumatoid Arthritis (RA) Consortium containing 868 RA cases and 1194 controls [24] and all datasets used were publically available [25, 26]. We chose four genes (VEGFA, PADI4, C5, ITGAV) to detect genegene coassociation with RA susceptibility, involving four, six, eight and eight SNPs in each gene respectively. Meanwhile, the other six methods mentioned above were also used to detect coassociation contributing to RA and their computation time was also calculated by R 3.1.0 on a desktop computer (Intel Core 2 with 3.00 GHz CPU using 4 GB of RAM).
Results
Simulation
Tables 1 and 2 show the type I error rates of the seven methods for different sample sizes in various scenarios (β _{1} = 0, β _{2} = log(1.5) and β _{1} = log(1.3), β _{2} = log(1.5)) under β _{3} = r = 0, while Table 3 shows the type I error rates under β _{1} = β _{2} = β _{3} = 0, r ≠ 0 with the sample size of 1200. It indicates that the type I error rates of all methods are within the acceptable range and more close to the given nominal level 0.05 with the larger sample sizes. Similar results can be obtained under the case (β _{1} = β _{2} = 0) in Additional file 1.
The power of the seven methods for type I coassociation is shown in Fig. 2a under various interaction effects when β _{1} = log(1.3), β _{2} = log(1.5) with sample size 1200. Obviously, the power of most methods increases monotonically as the interaction effects increase, and the SBS shows relatively higher power than the others. Similar power trends as a function of sample sizes also emerged under fixed marginal effects (β _{1} = log(1.3), β _{2} = log(1.5)) and interaction effect (β _{3} = log(1.5)) in Additional file 2.
For type II coassociation, the power of the seven methods is shown in Fig. 2b. With the main effects of two genes at 1.3 and 1.5 (β _{1} = log(1.3), β _{2} = log(1.5)) and the interaction odds ratio at 1(β _{3} = 0), the power of the SBS shows relatively better performance than other methods no matter what the MAF of the two causal SNPs is. Furthermore, under β _{3} = 0, Additional file 3: Figure S2 illustrates the power when the summation of the main effects of the two causal SNPs is fixed as log(2.8) (see Additional file 3). The proposed SBS shows highest power and all methods show the same trends, indicating that the type II genegene coassociation can indeed be caused only by correlation, i.e. (r(β _{1} + β _{2})).
Shown in Fig. 2c and d are the results of the power for type III coassociation. Figure 2c shows the results under various interaction odds ratios with the correlation coefficient at 0.3 and the sample size 1200. It reveals that the power of the seven methods increase monotonically as the interaction odds ratios increase. Apparently, the SBS outperforms all the other methods. Figure 2d shows the results under various causal SNP pairs with β _{3} = log(1.3) and the sample size 1200. It indicates that the SBS always keeps the highest power, though the power of all the methods varies heavily under different MAFs. Our proposed SBS is quite suitable for detecting genegene coassociation under high correlations comparing with other methods.
Under the situation with only one main effect (β _{1} = 0, β _{2} = log(1.5)), similar phenomenon also appeared (Fig. 3), except that the power under this situation was a little lower than that under the situation with two main effects. In addition, the results under β _{1} = 0, β _{2} = 0 further confirmed this in Additional file 4.
Application
Table 4 shows the results of genegene coassociation analysis of all seven methods for 868 RA cases and 1194 controls. Our proposed SBS, CCU statistic and KCCU statistic all suggest that coassociation of VEGFAPADI4 and C5PADI4 is significant with RA susceptibility at nominal level 0.05, whereas no significance can be found from the other methods. With regard to the computation time, take the VEGFAPADI4 as the example, the computation time for the SBS takes 1.02 s, 3.72 s for CCU, 99.6 s for single SNPbased logistic regression model, 0.6 s for PCAbased logistic regression model, 6.18 s for δ ^{2} statistic, 26.76 s for PLSPM, while up to 42 h for the KCCU using the same desktop computer (Intel Core 2 with 3.00 GHz CPU using 4 GB of RAM).
Discussion
The existence of genegene joint effects which contain the main effects and their coassociation, is one of the possible explanations for the “missing heritability” problems. Genegene coassociation refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. It is often customarily put into the framework of genegene interaction, and is identified by adding the product term into the traditional regression method. However, most diseases are caused by multiple genes acting together through pathways or networks where genes (or SNPs) are often correlated rather independence. The implying independence assumption of the regression model is rarely satisfied and the effects attributed to the correlation have usually been neglected. In addition, when constructing a priori topological structure for establishing genetic networks that contribute to diseases of interest, it seems more reasonable to test whether significant relationships between any two nodes in such networks exist or not by detection for genegene coassociation rather than traditional interaction. Thus, it is crucial to develop powerful methods to detect genegene coassociation.
In this paper, we have proposed a powerful scorebased statistic for testing genegene coassociation at gene or region level. One appealing property is that it theoretically has rigorous asymptotic distribution under the null hypothesis, which is computationally efficient without using permutation or bootstrap techniques. Actually, our group had developed several methods to detect genegene coassociation, such as Fisher rtoz transformationbased statistics, CCU, KCCU and PLSPMbased statistics. One common disadvantage for these methods is the high computation burden. Furthermore, comparing with other existed methods, several simulations had been conducted to confirm the stability and advantage of the proposed scorebased statistic under various coassociation scenarios. For type I coassociation, the power of the proposed scorebased statistic was close to PCAbased logistic regression model under smaller interaction odds ratio. While, as the interaction odds ratio increased, the increasing speed of its power was far beyond the other methods. In addition, under type II coassociation and type III coassociation, some methods (e.g. CCU statistic) did not work at all since they could not capture the correlation information between causal SNPs. In this context, our proposed scorebased statistic still outperformed others. Though the proposed scorebased statistic performed a little poorer than PLSPMbased statistic under some situations, its power kept higher than PLSPM under more realistic situations when causal SNP pairs were in stronger correlation. For the real data analysis, our proposed scorebased statistic can detect the coassociation of VEGFAPADI4 and C5PADI4 which have been reported earlier [8, 11], and its computation time was relative smaller than that of most methods, though a little larger than that of PCAbased logistic regression. This further confirmed its practicability. In addition, we also compared the proposed scorebased statistic with the least absolute shrinkage and selection operator (LASSO) as a classical shrinkagebased method [27]. All the simulation results indicated that the proposed scorebased statistic had the better performance than LASSO. It is indeed necessary to provide detailed information about the calculation of Pvalue. The Pvalue in LASSO is the proportion of the corresponding coefficients of the product terms greater than 0 among all SNP pairs. For instance, suppose there are 7 SNPs in each gene, we first removed one causal SNP pair to deal with the indirect association, then totally 6 × 6 = 36 product terms of SNP pairs were left and put into the LASSO regression model simultaneously. We recorded the corresponding coefficients which were not equal to 0 as m, and m/36 was calculated as the Pvalue. Finally, the power was calculated by averaging all the Pvalues from 1000 simulations. The R package lars has been used for LASSO in the simulations. We have added the corresponding results into the Additional file 1: Table S2, Table S3 and Additional file 5: Figure S4.
Since our proposed method is developed based on the classical score test, it can be easily extended to analyze genegene coassociation for continuous traits, which we can similarly calculated the score statistics from likelihood function. It is indeed important to guard against possible heterogeneity caused by some other covariates (e.g. age, gender, smoking status). One possible solution for this is MantelHaenszel method, which may suffer small sample size problem when the number of covariates is quite large. Another possible way is to calculate the conditional score statistics given the covariates.
One limitation for the proposed scorebased statistic is that it considers all possible SNP pairs from the two genes, and it may fail to rigorously follow the chisquare distribution if the number of SNPs is quite large. At present, it is quite difficult to give some recommendations regarding to the appropriate number of SNPs, since the performance of our proposed statistic depends on the sample sizes, the underlying gene structures and the coassociation effects. If the number of SNPs is too large, one possible solution is to adopt the nonparametric methods such as permutation test, another is to determine the tag SNPs from each gene first to reduce the number of SNPs and then to apply our proposed statistic to detect genegene coassociation. Actually, one natural and most commonly used algorithm for tag SNPs selection is based on the principle of the linkage disequilibrium (LD), where tag SNPs can usually be captured based on twomarker (pairwise) or multimarker measures of LD [28]. In practice, all LD and haplotype block analyses can be achieved by Haploview software [29]. Furthermore, there are many other methods have been recently proposed, including the weighted tagSNPset analytical method [30], the CLONTagger method [31], the diSNP selection method [32] and the FastTagger method [33]. Meanwhile, it is inevitable to yield very noisy covariance matrices and face multiple testing problems once extending the proposed statistic to a large genomewide scale, which should be considered in the future.
Conclusions
The proposed scorebased statistic is a powerful and efficient genebased method for detecting genegene coassociation compared to CCU, KCCU, PLSPMbased statistics, δ ^{2}statistic, single SNPbased and PCAbased logistic regression test.
Availability of supporting data
The GWAS data of North American Rheumatoid Arthritis Consortium were downloaded from the Genetic Analysis Workshop (http://www.gaworkshop.org/) with application in advance.
Abbreviations
 SBS:

Scorebased statistic
 GWAS:

Genomewide association study
 SNPs:

Single nucleotide polymorphisms
 CCU:

Statistic based on canonical correlations
 KCCU:

Statistic based on kernel canonical correlation analysis
 PLSPM:

Partial least squares path modeling
 PCA:

Principle component analysis
 LASSO:

The least absolute shrinkage and selection operator
 LD:

Linkage disequilibrium
 RA:

Rheumatoid arthritis
References
 1.
Kettunen J, Tukiainen T, Sarin AP, OrtegaAlonso A, Tikkanen E, Lyytikainen LP, et al. Genomewide association study identifies multiple loci influencing human serum metabolite levels. Nat Genet. 2012;44(3):269–76.
 2.
Chasman DI, Schurks M, Anttila V, de Vries B, Schminke U, Launer LJ, et al. Genomewide association study reveals three susceptibility loci for common migraine in the general population. Nat Genet. 2011;43(7):695–8.
 3.
Goode EL, ChenevixTrench G, Song H, Ramus SJ, Notaridou M, Lawrenson K, et al. A genomewide association study identifies susceptibility loci for ovarian cancer at 2q31 and 8q24. Nat Genet. 2010;42(10):874–9.
 4.
Gibson G. Hints of hidden heritability in GWAS. Nat Genet. 2010;42(7):558–60.
 5.
Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ, et al. Finding the missing heritability of complex diseases. Nature. 2009;461(7265):747–53.
 6.
Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010;11(6):446–50.
 7.
Gibson G. Rare and common variants: twenty arguments. Nat Rev Genet. 2011;13(2):135–45.
 8.
Peng Q, Zhao J, Xue F. A genebased method for detecting genegene coassociation in a casecontrol association study. Eur J Hum Genet. 2010;18(5):582–7.
 9.
Yuan Z, Liu H, Zhang X, Li F, Zhao J, Zhang F, et al. From interaction to coassociation a Fisher rtoz transformationbased simple statistic for real world genomewide association study. PLoS One. 2013;8(7):e70774.
 10.
Zhang X, Yang X, Yuan Z, Liu Y, Li F, Peng B, et al. A PLSPMbased test statistic for detecting genegene coassociation in genomewide association study with casecontrol design. PLoS One. 2013;8(4):e62129.
 11.
Yuan Z, Gao Q, He Y, Zhang X, Li F, Zhao J, et al. Detection for genegene coassociation via kernel canonical correlation analysis. BMC Genet. 2012;13:83.
 12.
Oti M, Brunner HG. The modular nature of genetic diseases. Clin Genet. 2007;71(1):1–11.
 13.
Li Y, Agarwal P. A pathwaybased view of human diseases and disease relationships. PLoS One. 2009;4(2):e4346.
 14.
Barabasi AL, Oltvai ZN. Network biology: understanding the cell’s functional organization. Nat Rev Genet. 2004;5(2):101–13.
 15.
Zhang X, Xue F, Liu H, Zhu D, Peng B, Wiemels JL, et al. Integrative Bayesian variable selection with genebased informative priors for genomewide association studies. BMC Genet. 2014;15(1):130.
 16.
Rajapakse I, Perlman MD, Martin PJ, Hansen JA, Kooperberg C. Multivariate detection of genegene interactions. Genet Epidemiol. 2012;36(6):622–30.
 17.
Marchini J, Donnelly P, Cardon LR. Genomewide strategies for detecting multiple loci that influence complex diseases. Nat Genet. 2005;37(4):413–7.
 18.
Arkin Y, Rahmani E, Kleber ME, Laaksonen R, Marz W, Halperin E. EPIQefficient detection of SNPSNP epistatic interactions for quantitative traits. Bioinformatics. 2014;30(12):i19–25.
 19.
Cordell HJ. Detecting genegene interactions that underlie human diseases. Nat Rev Genet. 2009;10(6):392–404.
 20.
Wang K, Abbott D. A principal components regression approach to multilocus genetic association studies. Genet Epidemiol. 2008;32(2):108–18.
 21.
International HapMap Project. http://hapmap.ncbi.nlm.nih.gov/. Accessed 10 Mar 2015.
 22.
Li J, Chen Y. Generating samples for association studies based on HapMap data. BMC Bioinformatics. 2008;9:44.
 23.
Chen Y, Li J. Generation of synthetic data and experimental designs in evaluating interactions for association studies. J Bioinform Comput Biol. 2012;10(1):1240005.
 24.
Plenge RM, Seielstad M, Padyukov L, Lee AT, Remmers EF, Ding B, et al. TRAF1C5 as a risk locus for rheumatoid arthritisa genomewide study. N Engl J Med. 2007;357(12):1199–209.
 25.
Zhao J. Genetic Analysis Workshop. 2006. http://www.gaworkshop.org/. Accessed 10 Mar 2015.
 26.
Amos CI, Chen WV, Seldin MF, Remmers EF, Taylor KE, Criswell LA, et al. Data for Genetic Analysis Workshop 16 Problem 1, association analysis of rheumatoid arthritis data. BMC Proceedings. 2009;3(7):1–4.
 27.
Tibshirani R. Regression shrinkage and selection via the Lasso. J R Statist Soc B. 1996;58(1):267–88.
 28.
de Bakker PI, Yelensky R, Pe'er I, Gabriel SB, Daly MJ, Altshuler D. Efficiency and power in genetic association studies. Nat Genet. 2005;37(11):1217–23.
 29.
Barrett JC, Fry B, Maller J, Daly MJ. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21(2):263–5.
 30.
Yan B, Wang S, Jia H, Liu X, Wang X. An efficient weighted tag SNPset analytical method in genomewide association studies. BMC Genet. 2015;16:25.
 31.
Ilhan I, Tezel G. How to select tag SNPs in genetic association studies? The CLONTagger method with parameter optimization. OMICS. 2013;17(7):368–83.
 32.
Wu C, Cui Y. Boosting signals in genebased association studies via efficient SNP selection. Brief Bioinform. 2014;15(2):279–91.
 33.
Liu G, Wang Y, Wong L. FastTagger: an efficient algorithm for genomewide tag SNP selection using multimarker linkage disequilibrium. BMC Bioinformatics. 2010;11:66.
Acknowledgements
This work was supported by grants from National Natural Science Foundation of China (grant number 31200994, 81273177 and 81373100). We thank GAW16 and the North American Rheumatoid Arthritis Consortium for the RA data. We would like to thank anonymous reviewers and academic editor for providing us with constructive comments and suggestions to improve the quality of the paper and also wish to acknowledge our colleagues for their invaluable work.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
ZSY, FZX, XSW and YXL conceptualized the study; JX, JDJ, XSZ and HKL acquired and analyzed the data; JX, ZSY and XSZ prepared for the manuscript and contributed on the manuscript revision. All authors read and approved the final manuscript.
Jing Xu and Zhongshang Yuan contributed equally to this work.
Additional files
Additional file 1: Table S1.
The type I error rates of the seven methods without correlation and interaction under (β _{1} = 0, β _{2} = 0, β _{3} = 0). Table S2. The type I error rates of the SBS and LASSO without correlation and interaction under (β _{1} = log(1.3), β _{2} = log(1.5)). Table S3. The type I error rates of the SBS and LASSO without main effects and interaction (β _{1} = 0, β _{2} = 0, β _{3} = 0). (DOCX 24 kb)
Additional file 2: Figure S1.
The power of the seven methods under different sample sizes with two main effects and fixed interaction effect (β _{1} = log(1.3), β _{2} = log(1.5), β _{3} = log(1.5)) for type I coassociation. (PDF 3 kb)
Additional file 3: Figure S2.
The power of the seven methods when the summation of the main effects of the two causal SNPs were fixed as log(2.8), interaction effect at β _{3} = 0 and the correlation at 0.5 for type II coassociation. (PDF 3 kb)
Additional file 4: Figure S3.
The power of the seven methods under different coassociation levels with no main effect (β _{1} = 0, β _{2} = 0). Note: figure a for Type I coassociation with different interaction effects; figure b for Type II coassociation with different causal SNP pairs; figure c for Type III coassociation given fixed correlation 0.3 and different interaction effects; figure d for Type III coassociation given fixed interaction effect β _{3} = log(1.3) and different causal SNP pairs. (PDF 7 kb)
Additional file 5: Figure S4.
The power of the SBS and LASSO under different coassociation levels with two main effect (β _{1} = log(1.3), β _{2} = log(1.5)). Note: figure a for Type I coassociation with different interaction effects; figure b for Type II coassociation with different causal SNP pairs; figure c for Type III coassociation given fixed correlation 0.3 and different interaction effects; figure d for Type III coassociation given fixed interaction effect β _{3} = log(1.3) and different causal SNP pairs. (PDF 3 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 Genegene coassociation
 Scorebased
 Genebased