A non-parametric approach for detecting gene-gene interactions associated with age-at-onset outcomes

Background Cox-regression-based methods have been commonly used for the analyses of survival outcomes, such as age-at-disease-onset. These methods generally assume the hazard functions are proportional among various risk groups. However, such an assumption may not be valid in genetic association studies, especially when complex interactions are involved. In addition, genetic association studies commonly adopt case-control designs. Direct use of Cox regression to case-control data may yield biased estimators and incorrect statistical inference. Results We propose a non-parametric approach, the weighted Nelson-Aalen (WNA) approach, for detecting genetic variants that are associated with age-dependent outcomes. The proposed approach can be directly applied to prospective cohort studies, and can be easily extended for population-based case-control studies. Moreover, it does not rely on any assumptions of the disease inheritance models, and is able to capture high-order gene-gene interactions. Through simulations, we show the proposed approach outperforms Cox-regression-based methods in various scenarios. We also conduct an empirical study of progression of nicotine dependence by applying the WNA approach to three independent datasets from the Study of Addiction: Genetics and Environment. In the initial dataset, two SNPs, rs6570989 and rs2930357, located in genes GRIK2 and CSMD1, are found to be significantly associated with the progression of nicotine dependence (ND). The joint association is further replicated in two independent datasets. Further analysis suggests that these two genes may interact and be associated with the progression of ND. Conclusions As demonstrated by the simulation studies and real data analysis, the proposed approach provides an efficient tool for detecting genetic interactions associated with age-at-onset outcomes.


Background
For most common complex diseases, if not all, the currently identified genetic loci only explain a small percentage of the disease heritability [1,2]. The search for genetic variants underlying complex diseases remains to be a major goal and challenge for the coming decades. While genetic variants, such as rare variants and structure variation, may contribute to the remaining heritability, part of the missing heritability could be explained by the interplay of genetic variants through complicated mechanisms [3,4]. The study of gene-gene interaction may reveal novel findings explaining missing heritability, and shed light on the underlying etiological and pathophysiological processes that result in complex diseases. Although substantial efforts are given to identification of gene-gene (G-G) interactions related to binary disease outcomes, less attention has been given to G-G interaction research on other clinical features, such as ages at disease onset, which may more closely reflect the dynamic process of disease development [5]. A number of studies show that the intensity of natural selection on a gene declines with the age at which it is expressed [6][7][8]. In addition, practical evidence suggests that many complex diseases, such as Alzheimer's diseases and bipolar diseases, have separate pools of genetic variants that contribute to the early-onset and late-onset cases, indicating distinctive biological pathways involved in the disease development process [9,10]. Though G-G interactions are ubiquitous in various biological pathways related with the disease development process [3], identification of these G-G interactions presents continuing challenges.
Cox-regression is a powerful tool for the genetic analysis of age-at-disease-onset outcomes. It has been used in genetic association studies for detecting single genetic variant associated with disease progression [11][12][13]. To better characterize the association in a candidate gene/ region, Cox-regression has also been extended to handle multiple loci via haplotype analysis [14][15][16]. However, Cox-regression is less suitable for the analysis of a large number of genetic variants and possible G-G interactions, because of the rapidly increasing number of parameters. In addition, Cox regression assumes the hazard rates are proportional among various risk groups. Such an assumption may be questionable in genetic studies, especially when complex interactions are involved. Further, most of the current genetic association studies adopt case-control designs, by which cases are usually over-sampled from the source population. Applying Cox regression to casecontrol studies may raise another issue of biased estimation of hazard ratios [17,18]. To address this issue, Nan et al. proposed to analyze age-at-disease-onset for case-control studies using a modified case-cohort approach (MCC) [19]. It was shown that this approach had very small bias when the disease prevalence was low. However, the bias would increase with the disease prevalence, which may limit its application to studies of common diseases.
As a non-parametric alternative to Cox regression, Nelson-Aalen estimator has been widely used to analyze survival outcomes. It was first introduced by Nelson and was later on rediscovered by Aalen, who derived the estimator using the modern counting process techniques [20,21]. It was shown to have a number of nice properties, such as requiring less assumptions and better small-sample-size performance than other standard approaches [22]. Considering these advantages, we propose a weighted Nelson-Aalen (WNA) approach for detecting genetic variants associated with age-at-disease-onset, considering possible interactions. The proposed approach searches for the best combination of loci forwardly, and tests their joint association with the age-at-disease-onset. Our approach has the following advantages. First, it is a multi-locus approach that is applicable to a set of genetic variants (i.e. from a number of candidate genes or a genetic pathway) with the consideration of high-order interaction. Second, it is a non-parametric approach, which does not make any assumption of the disease models. Third, it can be directly applied to prospective cohort studies, and can be easily extended for population-based case-control studies. Through simulations, we compare the performance of the proposed approach with Coxregression-based approaches under both perspective cohort and case-control study designs. We further illustrate our approach with a real data application to smokers' progression to nicotine dependence (ND), using three independent datasets from the Study of Addiction: Genetics and Environment (SAGE).

Methods
Suppose we have a study population of n subjects, which is a sub-cohort from the source population. We denote T i , δ i and X i as the observed age, the disease status and the genetic markers for subject i, respectively. Let where T O i and T S i are the age-at-disease-onset and age-at-survey for subject i. Then T O i is either observed or right-censored, and we assume the cause of censoring is independent with either age-at-disease-onset or any genetic variants. Further denote is an indicator function. Without loss of generality, we assume K genetic markers, X i = (X i1 , X i2 ......, X iK ), are single nucleotide polymorphisms (SNP) with three possible genotypes, X ij ∈ {AA, Aa, aa}, 1 ≤ j ≤ K. Our hypothesis is that the K SNPs and their possible interactions may be associated with the age-at-disease-onset outcome. In the following, we introduce the WNA approach for perspective cohort studies, and then extend it for population-based case-control studies.
Weighted nelson-Aalen for prospective cohort studies a. Association testing of multiple genotype groups with age-at-disease-onset Assume k disease-susceptibility SNPs are associated with ages of disease onset, by forming L G-G groups, G 1 , G 2 ,......,G L , each representing a different hazard rate for the disease onset. Given these G-G groups, we can partition all subjects into L groups, S 1 , S 2 ......,S L , where S 1 = {i|X i = G l }, 1 ≤ l ≤ L. Suppose the onsets of a disease are observed at M distinct ages, t 1 < t 2 < ...... < t M . Let Y (t) be the number of subjects who are at risk at age t, and N(t) be the number of subjects who have disease-onset by age t. The cumulative hazard function H and survival function S are estimated as, In a similar manner, we can also define the groupspecific cumulative hazard functions H l (t), 1 ≤ l ≤ L, based on the subjects within group S l . To examine the joint association of k SNPs with the age-at-disease-onset, we test the following hypothesis: H 0 : H 1 (t) = H 2 (t)=...... = H L (t) for all t ≤ τ versus H A : at least one H l (t) is different for some t ≤ τ.
Here τ is the largest observed age of onset in the study.
Let Y l (t) be the number of subjects in group S l who are at risk at age t; N l (t) be the number of subjects in group S l who have the disease-onset by age t. The test statistic can be formed based on Z = (Z 1 , Z 2 ,...... Z L ), where In Equation (2), W (t) is a weight function for the ages at disease onset. A variety of weight functions have been proposed in the literature. For example, W (t) = 1 for any t, would lead to the widely used log-rank test [23]; W (t) = Y(t) would lead to Mann-Whitney-Wilcoxon test and the generalization of Kruskal-Wallis test [24,25]. In our study, we suggest using the weight function, which has a form similar to the Kaplan-Meier estimator in the entire study population, and gives the most weight to early disease onset. This weight function was first proposed by Peto et al. [26], and was also suggested in a series of articles [27,28]. The variance of Z l and the covariance between Z l and Z l ′ can then be calculated as, The components of Z = (Z 1 , Z 2 ,......, Z L ) are linearly dependent because X L l¼1 Z l ¼ 0. Therefore, the test statistics can be calculated based on any L − 1 components of Z, such as Z 1 , Z 2 ,......, Z L-1 . The test statistic can be formed as, Δ WNA ¼ Z 1 ; Z 2 ; ::::::; Z L−1 ð Þ X −1 Z 1 ; Z 2 ; ::::::; where Σ is the variance-covariance matrix for (Z 1 , Z 2 ,......, Z L-1 ). Under the null hypothesis of no association, the above test statistic has asymptotically a Chi-square distribution with L − 1 degrees of freedom. The theoretical details of the test can be found elsewhere [29].
b. Selection of multi-SNP combinations by recursive partitioning In genetic association studies with a set of SNPs, it is expected that only a small subset of SNPs are associated with the disease. To determine the disease-susceptibility SNPs and the associated G-G groups, we adopt a recursive partitioning algorithm. The algorithm starts with a null model by treating all study samples as one group.
In each of the following steps, it gradually selects diseasesusceptibility markers, and then divides samples into different G-G groups. In step one, we search among all available SNPs for a single SNP that can best divide samples into two groups. Assuming the minor allele a is the risk allele of j-th SNP, we consider three partitioning strategies: 3) Heterozygote effect: For each partitioning strategy, we calculate Δ WNA and its corresponding p-value. We repeat the partitioning process for all SNPs, and choose the most significant group partitioning for step one, denoted it as S In step two, a second SNP j′ is considered to further partition the existing two groups into four G-G groups, Again, the group partitioning that most significantly related to the age-at-disease-onset is selected in the step two. In a similar fashion, the diseasesusceptibility SNPs can be selected forwardly into the model to partition samples into different G-G groups. Ten-fold cross-validation (CV) is then used to determine the most parsimonious model with an optimal number of G-G groups. In this procedure, all the subjects are randomly divided into 10 subsets. Then 9 of the 10 subsets are used as the training set, while the remaining one is used as the testing set. The process is repeated 10 times to make sure all subsets have served as a testing set. In each testing set, a test statistic is calculated based on the G-G groups selected from the corresponding training set. The final model with an optimal number of G-G groups is the chosen to be the one that attains the highest significance level of the averaged testing statistic from 10 testing sets. After the final model is determined, an overall test statistic based on entire samples, Δ WNA , is obtained, and is used to evaluate the association of the selected G-G groups with the age-at-disease-onset. In order to account for the inflated Type I error due to selection of G-G groups, a permutation test is used to assess the significance level. In the permutation process, the outcomes of individuals (i.e. the age-of-onset and censoring status outcomes) in each permutation replicate are simultaneously permuted. The forward selection algorithm is then applied to the permuted data to choose the best G-G group and calculate Δ WNA . By repeating the same process on a large number of permutation replicates (e.g. 1,000), the empirical null distribution of Δ WNA is generated and an empirical p-value can be obtained. In a replication study where the G-G group is pre-determined, the asymptotic test based on the Chi-square distribution can be used.

Modification for population-based case-control studies
Most of existing and ongoing genetic association studies adopt case-control design, where controls are not matched to the cases by age. To facilitate the survival analysis of genetic data from case-control studies, we also propose a modified WNA for case-control data. Suppose the study includes n subjects with n 1 cases and n 0 controls (n = n 1 + n 0 ). Assuming the disease has a prevalence of ρ in the general population, we modify the hazard function by adjusting the number of subjects at risk, Y (t): Correspondingly, we modify the group-wise hazard function by adjusting the number of subjects in group S l , where f l denotes the frequency of genotype G l among controls, and X L l¼1 f l ¼ 1 . With this adjustment, we expect to retrieve a pseudo-cohort population with a number of unobserved controls, who are expected to be at risk throughout the study.

Simulation studies
In the simulation studies, we evaluated the performance of the proposed WNA approach and compared it with those of Cox-regression-based approaches. Two series of simulations were conducted for perspective cohort studies and case-control studies, respectively. In the simulations, we assumed a subject's age-at-survey, T S i , followed a normal distribution, N(60,10 2 ), and its age-at-disease-onset, T O i , might follow various disease models described below. Each subject had an observed age , and a censoring status, δ i , determined by Two causal SNPs with an interaction effect were simulated for each disease scenario. We also assumed each SNP had two alleles, A and a, and the minor allele a had a frequency of 0.3 leading to an early onset of the disease. In addition, eight noise SNPs were also simulated with minor allele frequencies sampled from a uniform distribution, Unif[0.1, 0.5]. For each disease model, we simulated one million subjects as the source population, and the disease prevalence was calculated by ρ ¼ The parameters were chosen to ensure the disease prevalence was within the range of [0.15, 0.25] in the source population. For prospective cohort studies, 1,000 subjects were selected as the study population, while 500 cases (δ i = 1) and 500 controls (δ i = 0) were selected for case-control studies.

Disease model 1
The underlying disease model was simulated to mimic an ideal scenario where the proportional hazard (PH) assumption was satisfied and the hazard ratio was linear. In the model, we assumed the hazard function h i (t) for an individual i had a semi-parametric form, In the simulation, we specified the baseline hazard h 0 (t) by a Weibull distribution with a shape parameter θ = 3 and a scale parameter λ = 100.

Disease model 2
The underlying disease model was simulated to mimic a scenario where the PH assumption was met, but the hazard ratio was non-linear. For this model, we assumed the hazard function for an individual i had the form: In such a model, we assumed individuals with risk alleles at both loci had a high risk of disease than the remaining individuals.

Disease model 3
This disease model mimicked a scenario where the PH assumption was violated. In this model, we assumed the age-at-disease-onset for a subject with a G-G combination of (x i1 , x i2 ) followed a Weibull distribution W(λ,θ xi1 , xi12 ), where the scale parameter λ was fixed at 100 and the shape parameter varied by the G-G combinations of two causal SNPs. In such a model, we assumed the hazard functions increased over time for all G-G combinations (i.e.,θ > 1).

Disease model 4
This disease model also assumed the PH assumption was violated. Different from Disease Model 3, we assumed the hazard functions may remained constant or decreased overtime for certain G-G combinations (i.e. θ ≤ 1), mimicking an early onset disease scenario.

Simulation results
Simulations were conducted to compare WNA with the conventional Cox regression (COX) approach under the perspective cohort studies. Additional simulations were also perform to evaluate WNA with the modified case cohort (MCC) approach proposed by Nan et al (Nan and Lin, 2008) under case-control studies. To be consistent with WNA, we also adopted a forward selection strategy for the COX and MCC, and used the Akaike information criterion (AIC) as the criteria for model selection. For each disease model, we started the analysis with two causal SNPs, and gradually added noise SNPs into the analysis. The simulation was repeated for 1,000 times for each disease model. In each replicate, a final model was selected by each approach, and was then evaluated on an independent sub-cohort of 1,000 subjects from the source population. Because an independent dataset was used for validation in each replicate, the asymptotic test based on the Chi-square distribution was used. The Type I error and power were thus calculated as the probability for the selected final model to have a p-value less than 0.05 on the independent sub-cohort of 1,000 subjects.
We first evaluated the Type I errors for both approaches. In this case, we evaluated a null model with ten SNPs, simulated independently from the age-atdisease-onset outcome. As shown in Table 1, the Type I errors were well controlled for COX/MCC approach, and reasonably controlled for WNA. We then evaluated the power and the sensitivity (specificity) of both approaches. The sensitivity was calculated as the probability of selecting a causal SNP in the final model, while the specificity was calculated as the probability of not selecting a noise SNP in the final model. The results for perspective cohort studies were summarized in Table 2. Under the Disease Model 1, while COX had a higher power than WNA (i.e., 0.702 vs. 0.738) when only causal SNPs were considered, its power decreased much faster than that of WNA as the number of noise SNPs increased. When the noise SNPs reaches 8, WNA attained a higher power than COX (i.e., 0.567 vs. 0.427). When the hazard ratio was non-linear (Disease Model 2) or PH assumption was violated (Disease Models 3 and 4), WNA had a consistently higher power than COX. WNA showed the most advantages over COX under Disease Model 4 when the hazard functions do not follow monotonic patterns. In all scenarios, COX tended to have a higher sensitivity, but a lower specificity than WNA. This indicated COX tended to include more noise SNPs in the model than WNA, which partially explained its relatively lower power with the increasing number of noise SNPs. Compared to COX, WNA had relatively more robust performance, especially when the PH assumption was violated.
The simulation results for case-control studies were summarized in Table 3. WNA attained a higher power than MCC under all simulated models, which could be explained by the biased estimation of MCC under common disease scenarios. The power of MCC also decreased more rapidly as the number of noise SNPs increased, compared to the power of COX in cohort studies. Similar with COX, MCC tended to have a higher sensitivity, but a lower specificity than WNA.
For case-control studies, we assumed that the disease prevalence was accurately estimated. However, in practical, the prevalence of a disease might be estimated inaccurately. Therefore, additional simulation was conducted to evaluate the performance of WNA when disease prevalence was estimated inaccurately. For simplicity, we used the Disease Model 2 as an example, and considered both accurate and inaccurately estimated disease prevalence values, including ρ, ρ ± 5%, ρ ± 10%. The results were summarized in Table 4. The power of WNA increased slightly as the disease prevalence decreased. However, the type I errors also increased as the disease prevalence decreased, and would be inflated when disease prevalence was under-estimated. Further, the underestimation of disease prevalence appeared to increase the sensitivity of SNP selection, but reduced the specificity. By adjusting the disease prevalence, we expect to retrieve a pseudo-cohort population with a number of unobserved controls, who are expected to be at risk throughout the study. Under-estimating the disease prevalence would artificially increase the number of controls in the study, and thus cause bias in the follow-up studies with an independent sub-cohort data from the source population.
Application to study the progression to nicotine dependence (ND) Previous studies have indicated that the progression to ND could be influenced by the interplay of genetic variants [30,31]. Detecting G-G interactions contributing to the development of ND would help to understand the   [32]. The SAGE included standardized diagnostic assessments of ND by Diagnostic and Statistical Manual of Mental Disorders (DSM) IV, and its assessment plans for age-at-onset variables were also guided by standardized interview protocols and assessments, as described in prior SAGE publications [33,34]. We considered two age-at-onset variables, age-at-onset of ND and age-at-initiation of tobacco uses, and defined progression to ND as their difference. The study subjects under the investigation were limited to those who ever smoked cigarettes daily for a month or more. For non-ND subjects, age-at-survey was used as right-censoring values for age-at-onset of ND. After removing the subjects with missing outcomes, there were 706, 727, and 1,232 subjects in FSCD, COGA and COGEND, respectively. From the literature, we selected 150 SNPs across 64 candidate genes that have been reported for potential association with ND. Among those 150 SNPs, genotypes for 124 SNPs were available in the SAGE dataset, while genotypes for the remaining 26 SNPs were imputed by using PLINK [35]. The HapMap phase III founders of the CEU and ASW populations were used in the imputation as the reference panels for the white and black subjects [36]. We applied WNA to FSCD for an initial G-G interaction search and then replicated the initial findings in COGA and COGEND. While applying WNA, we fixed the disease prevalence at ρ = 0.24, which was estimated according to national survey [37]. Two SNPs, rs6570989 (A/G) and rs2930357 (C/T), located in gene GRIK2 and CSMD1, were identified in the initial search to be jointly associated with progression of ND with a nominal p-value of 9.68e-13. Permutation test was then conducted to estimate the empirical p-value, accounting for overestimation due to the model selection. The empirical p-value obtained from permutation test indicated a significant association (i.e., p-value < 0.001). Further validation of the finding in COGA (p-value = 0.034) and COGEND (p-value = 7.85e-04) showed the association remained significant at 5% level (Table 5). Survival curves in Figure 1 showed that the effect of rs6570989 was modified by the genotypes of rs2930357 in FSCD, which indicated a possible G-G interaction between two SNPs (Figure 1 A1-A2). Similar patterns were also observed in COGA (Figure 1 B1-B2) and COGEND (Figure 1 C1-C2). To account for the possible bias estimation of disease prevalence, we further examined the joint association for the identified two SNPs with the disease prevalence rates of 0.19 and 0.29 (i.e. 0.24 ± 0.05). The results showed that the significance level decreased as the disease prevalence increased (Table 6), but all joint association remained at least marginally significant with the disease prevalence of 0.29 (p-values were 4.08e-12, 0.054 and 2.00e-03 in FSCA, COGA and COGEND, respectively).
We also applied the Cox regression method to the same datasets. Only pair-wise interactions among SNPs were considered in the selection. The final model was determined by forward selection to minimize the AIC value. In FSCD, Cox regression with forward selection picked up nine SNPs involving a complicated model with a total number of 45 parameters. This association could not be replicated in either COGA (p-value = 0.703) or COGEND (p-value = 0.218).

Discussion
Complex diseases, manifesting with various clinical features, are believed to be caused by the joint action of multiple genetic variants through distinctive biological pathways. If two genes are jointly involved in producing the variability of a disease feature, whether additively or not, biological interaction between them is involved [38]. Although there is growing interest in detecting genetic variants that characterize disease progression, relatively few approaches have been proposed to evaluate the interaction among multiple genetic variants. In this article, we have proposed a non-parametric approach, referred to as WNA, for testing the joint association of multiple genetic variants with the age-at-onset outcomes, taking possible G-G interactions into account. The approach can be applied to both prospective cohort studies and case-control studies. Through simulations and an empirical study, we have shown that our approach had a comparable or better performance than Cox regression under various scenarios. The outperformance of WNA over Cox regression can be explained by the following reasons: 1) WNA does not assume any patterns of the hazard functions among G-G groups, which makes it more robust under various disease scenarios. While we expect Cox regression to have a better performance than WNA when the underlying disease model is known, in reality our understanding of the mode of inheritance for complex diseases is very limited. In such a case, non-parametric approaches, such as WNA, would have more advantages for the search of gene-gene interactions. 2) When a set of SNPs are involved, Cox regression tends to select highly complex genetic models with a large number of parameters. Unlike Cox regression, WNA is a non-parametric approach, and does not assume any parametric model for the selected G-G combinations. When the assumptions are violated, WNA likely captures the underlying G-G combinations, which can be replicated in independent studies. 3) Given the disease prevalence in the source population, WNA can be easily extended to studies with case-control designs.
For case-controls studies, it has been suggested that Cox regression should be used with great caution [17,18], due to the biased estimation of the effect size and incorrect statistical inference.
In the simulation, one of our aims is to examine robustness of the proposed WNA approach under case-control studies of common diseases, where the COX/MCC approach has biased estimates. Therefore, we have evaluated the performance of two approaches with an independent sub-cohort dataset from the source population, mimicking an independent follow-up study in a real application. The results have shown WNA approach had an improved power by adjusting for the disease prevalence in casecontrol studies, but in the meantime, under-estimating disease prevalence may lead to an inflated type I error for WNA. Caution should be taken in specifying the disease prevalence in real applications.
Another important consideration is the choice of weight function in WNA. The weight function in this study gives the most weight to departures of hazard functions at early ages. Alternatively, we could also adopt a general class of weight functions based on the  Kaplan-Meier estimator [28]. The weight function has the form, whereŜ : ð Þ is the Kaplan-Meier estimator of the survival function. The values of p and q should be chosen appropriately according to the hypothesis of interest. For instance, when p = 0, q > 0, the above weight gives more consideration to departures of hazard functions at late ages.
In the real data application, we identified two SNPs, rs6570989 and rs2930357, jointly associated with ND. In a recent GWAS of 3497 Dutch subjects, both of these two SNP were found to be significantly associated with current smoking [39]. These two SNPs are located in gene GRIK2 and CSMD1, respectively. Both GRIK2 and CSMD1 have been suggested to be functionally related with ND. GRIK2 belongs to the kainate family of glutamate receptors, which are actively involved in a variety of neurophysiologic processes [40,41]. GRIK2 has also been reported to be associated with smoking cessation [42]. Gene CSMD1 was shown to be highly expressed in the central nerve system [43], and to be related to smoking cessation [44]. A number of studies have also suggested that early smoking initiation and the development of nicotine dependence are associated with greater difficulty to quit smoking [45][46][47]. Nonetheless, relatively few studies have been conducted to evaluate ND age-at-onset outcomes, and our knowledge regarding the genetic contribution to the progression of ND is still lacking. While it is biologically plausible that the two identified genes may have a joint association contributed to the progression of ND, further studies are required to replicate this result.
We are aware that the proposed approach has some limitations. First, the test statistic of WNA follows an asymptotic Chi-square distribution when evaluating common genetic variants. However, if a genetic variant has a very low minor allele frequency, it may form certain G-G groups with a small number of subjects. In such a case, the asymptotic property of the test may not hold [48,49]. Therefore, for the rare variants, we suggest that an exact test be used to evaluate the significance [50]. Second, the proposed approach used a forward selection strategy, and we expect the power to decrease if none of the genetic variants has any marginal effect. In this specific case, exhaustive selection will be needed to detect a G-G interaction, but at a much higher computation cost. Third, the proposed approach is implemented in R with model selection, cross-validation and permutation procedures. It is less computationally efficient than applying a Cox regression model available in R. On a dual core 3.20GHz desktop, the average computation time for applying the WNA approach and Cox regression were 23.2 second and 0.26 second, respectively. On replication datasets when the optimal G-G model was predetermined, the computation time for applying the WNA approach was significantly reduced to 0.028 second, which was comparable to those for Cox regression (0.031 second).
One major advantage of the proposed WNA approach is its capability of handling multiple genetic variants with the consideration of possible high-order G-G interactions [51]. It is worthwhile to note that WNA is a nonparametric approach developed for both cohort studies and case-control studies, which differs from other approaches, such as the kernel-machine based approach [52]. Further, we limited the application of WNA approach to population-based case-control studies in which the cases and controls were not matched by age. If controls are matched to cases and are randomly selected from all those at risk at the age-of-onset of the cases, Cox regression can estimate the effect size of by a conditional likelihood method without bias [53,54].

Conclusions
We have proposed a statistical approach for detecting genetic interactions associated with age-at-onset outcomes. The approach is able to capture high-order gene-gene interactions, and can be applied to both prospective cohort studies and case-control studies. Through simulations, we showed that the new approach had comparable or better performance than the conventional Cox-regression-based methods. The empirical data applications to nicotine dependence also identified two genes, GRIK2 and CSMD1, joint associated with the progression of nicotine dependence. In addition to conventional statistical approaches for survival outcomes, the new approach provides an alternative way to model genetic interactions related to survival outcomes.