Estimating effects of rare haplotypes on failure time using a penalized Cox proportional hazards regression model
 Olga W Souverein^{1},
 Aeilko H Zwinderman^{1},
 J Wouter Jukema^{2} and
 Michael WT Tanck^{1}Email author
DOI: 10.1186/1471215699
© Souverein et al; licensee BioMed Central Ltd. 2008
Received: 27 April 2007
Accepted: 25 January 2008
Published: 25 January 2008
Abstract
Background
This paper describes a likelihood approach to model the relation between failure time and haplotypes in studies with unrelated individuals where haplotype phase is unknown, while dealing with the problem of unstable estimates due to rare haplotypes by considering a penalized loglikelihood.
Results
The Cox model presented here incorporates the uncertainty related to the unknown phase of multiple heterozygous individuals as weights. Estimation is performed with an EM algorithm. In the Estep the weights are estimated, and in the Mstep the parameter estimates are estimated by maximizing the expectation of the joint loglikelihood, and the baseline hazard function and haplotype frequencies are calculated. These steps are iterated until the parameter estimates converge. Two penalty functions are considered, namely the ridge penalty and a difference penalty, which is based on the assumption that similar haplotypes show similar effects.
Simulations were conducted to investigate properties of the method, and the association between IL10 haplotypes and risk of target vessel revascularization was investigated in 2653 patients from the GENDER study.
Conclusion
Results from simulations and real data show that the penalized loglikelihood approach produces valid results, indicating that this method is of interest when studying the association between rare haplotypes and failure time in studies of unrelated individuals.
Background
In recent years there has been a great interest in associating haplotypes with complex disease phenotypes, and many statistical models have been described. These models are complicated by individuals that are heterozygous on two or more of these SNPs, because their haplotypes cannot be determined with certainty. Consider for instance two SNPs with alleles A or a, and B or b. Individuals that are heterozygous for both SNPs, have genotypes Aa and Bb, and they inherited either haplotype AB from one parent and ab from the other, or they inherited haplotypes Ab and aB. Hence, it is unknown whether these individuals have haplotype pair AB/ab or haplotype pair Ab/aB. This uncertainty complicates statistical inference and if the number of biallelic single nucleotide polymorphisms (SNPs) is large, or when allele frequencies of the SNPs are close to 50%, many individuals will be multiple heterozygous.
Most of the current methods focus on continuous or dichotomous outcome data [1–9], while only few can be applied in cohort studies [10, 11]. Another concern is related to the presence of rare haplotypes, which is a very common problem in genetic association studies. In the present paper we adopt the suggestion of Tanck et al. [12] to use a weighted penalized likelihood method to estimate the association between a phenotype and the set of haplotypes, which may include rare haplotypes. We consider a model for the relation between a failure time T measured in N unrelated individuals and the haplotypes of these individuals formed by m SNPs measured in a single gene. Previously, Lin [10] has described a similar method for haplotype analysis in cohort studies, but this method did not include a penalty function for dealing with the unstable estimates of rare haplotypes.
In the sequel we will first describe the kind of data that we analyze, then we will described our statistical model, and the algorithm to estimate the parameters of our model. We will use simulated data to illustrate some characteristics of the estimators, and finally we will analyze real data from the GENDER study on cardiovascular disease [13].
Results
Algorithm
Data and model
We consider a sample of i = 1,..., N unrelated individuals with failure or censoredtime t_{ i }. The indicator d_{ i }is used to indicate whether t_{ i }is an eventtime (d_{ i }= 1), or a censoredtime (d_{ i }= 0). Let g_{ i }be a vector of m biallelic SNPs measured in individual i. With m biallelic SNPs there are j = 1,.., Nhap = 2^{ m }different haplotypes possible with population frequencies p_{1},..., p_{ j },... p_{ Nhap }.
Suppose all haplotypes were observed in all patients, then these could be represented with the vector x_{ i }of length Nhap, where x_{ ij }equals 0, 1, or 2, depending on the number of haplotypes of type j observed in patient i. (Notice that ∑_{ j }x_{ ij }= 2, meaning that only Nhap  1 contrasts are identifiable.) The conditional hazard function for failure at t_{ i }given x_{ i }can then be specified as
ln(h(t_{ i }x_{ i })) = ln(h_{0}(t_{ i })) + β'x_{ i }, (1)
where H_{0}(t) is the unspecified baseline cumulative hazard function.
where S_{ q }(t_{ i }x_{ iq }) is the survival function specified as in equation (2), and x_{ iq }is the q^{ th }haplotype pair that is possible given genotypes g_{ i }, and w_{ iq }is the probability that individual i has haplotype pair q given genotypes g_{ i }: w_{ iq }= Pr(X_{ i }= x_{ iq }g_{ i }).
where the summation takes place over all haplotype pairs that are compatible with genotypes g_{ i }and d_{ hri }is an indicator function, which = 1 when the haplotype pair (h, r) is compatible with g_{ i }and 0 otherwise.
where λ_{ j }is the jump in the cumulative baseline hazard function at time τ_{ j }, and I(τ_{ i }> τ_{ j }) is an indicator function.
Notice that if z_{ i }= 1 for all i, thus when all haplotypes were observed, then ${\tilde{w}}_{iq{t}_{i}}$ = 1, and equation (7) reduces to the usual Breslow estimator of H_{0}(t), and equation (8) to the usual Cox estimator of β.
Unfortunately, the weights ${\tilde{w}}_{iq{t}_{i}}$ depend on both H_{0}(t) and β.
Estimation algorithm
Since we are mainly interested in the uncertainty of β, we only used that part of the hessian that pertains to β. Notice that the first term of (14) equals the hessianmatrix that is used in the Mstep of the EM algorithm.
Penalized loglikelihood
Mutations in general tend to be rare, and so are the haplotypes in which they are encompassed. Furthermore, when ten loci are considered there are 2^{10} = 1024 different haplotypes possible, many of which will have low frequency in samples up to thousands of individuals. If haplotypes have low frequency their associated hazard ratio estimate will be unstable. We used a penalized loglikelihood method to obtain more stable parameter estimates. Basically, we optimized the penalized loglikelihood ℓ^{ p }, defined as ${\ell}^{p}=ln{L}^{Breslow}\frac{\lambda}{2}Pen(\beta )$, where Pen(β) is the penalty function.
As penalty functions we considered the wellknown ridgepenalty function $(Pen(\beta )={\displaystyle {\sum}_{a}{\beta}_{a}^{2}})$, and a differencepenalty function (Pen(β) = ∑_{ a }∑_{ b }a_{ ab }(β_{ a } β_{ b })^{2} (a > b)), where a_{ ab }is a fixed and known value representing the similarity of haplotypes a and b. We quantified the similarity between haplotypes (a_{ ab }) by counting the number of shared alleles which – with m loci – varies between zero and m  1.
where U_{ i }(β^{ λ }) is the contribution of individual i to the firstorder derivative of the unpenalized loglikelihood, and H^{ λ }(β^{ λ }) is the matrix of secondorder derivatives of the penalized loglikelihood evaluated at β^{ λ }, H_{0}(t), and p. Notice that the last term of (15) is equal to the third term of (14).
Although the penalized likelihood estimates of β are somewhat biased, and it is therefore somewhat unclear how to interpret standard errors, we nevertheless assessed the stability of the penalized estimates by a parametric bootstrap procedure. We took 200 bootstrap samples, estimated λ in each sample by optimizing CVL, and derived standard errors from the distribution of the associated penalized estimates of β, p, and H_{0}(t). The number of bootstrap samples used was based on results from simulations, in which the number of bootstrap samples varied from 10 to 1000. These data showed that SE estimates were relatively stable after 100 to 200 bootstrap samples.
The EM algorithm presented in this paper was programmed in MATLAB^{®} R 7.0 (The MathWorks, Natick, MA, USA) as well as in a set of Rfunctions and is freely available upon request from the corresponding author.
Testing
To illustrate some characteristics of our approach, simulations were carried out. In each replicate, a data set of 200 (simulation 1 and 2) or 2000 (simulation 3) individuals was created in whom 3 loci were measured. We simulated the 8 haplotypes (x_{1},..., x_{8}) to have frequencies of: p_{000} = 0.62, p_{001} = 0.05, p_{010} = 0.02, p_{011} = 0.005, p_{100} = 0.02, p_{101} = 0.003, p_{110} = 0.002 and p_{111} = 0.28. Given the haplotypes drawn for a specific individual i, the survival time S was drawn from the exponential distribution with log(intensity) equal to ∑_{ j }β_{ j }x_{ ij }. A censoring time C was independently drawn from an independent lognormal distribution such that in about 25% of all individuals C <S, in which case the survival time was censored at C. In each replicate, the haplotype effects were estimated using three models: 1) unpenalized (similar to [10] and [11]), 2) ridge penalized and 3) difference penalized. The statistical properties were evaluated using three different measures, namely the mean bias of the parameter estimates, the mean SE and the coverage probability, which is defined as the probability that the 95% confidence interval of the parameter estimate contains the true theoretical value of the parameter estimate.
Furthermore, for each haplotype the percentage of replicates which identified the haplotype as being significantly associated with the outcome (i.e., power or Type I error rate) was calculated. The significance level used to calculate the power and the Type I error rate was set to α = 0.05. In addition, the effect of omitting rare haplotypes (011, 101 and 110) or fixing their effect to the close haplotype 111 on the effect estimates was investigated in the unpenalized models only. For simulation 1 and 2, 500 replicates were carried out, whereas 100 replicates were carried out in simulation 3.
Numbers of individuals with the various genotypes on three loci in a simulation of 200 individuals
Locus 3  

Locus 1  Locus 2  wild type^{ a }  heterozygote  homozygote 
wild type  wild type  83  13  1 
heterozygote  6  2  0  
homozygote  0  0  0  
heterozygote  wild type  6  0  0 
heterozygote  0  63  3  
homozygote  0  4  2  
homozygote  wild type  0  0  0 
heterozygote  0  3  1  
homozygote  0  0  13 
Realized β^{ a }, mean bias, mean standard error (SE), coverage probability and percentage of the effects that had a pvalue < 0.05 in the simulation where haplotypes with a rare allele at locus 2 had a modeled parameter estimate of 0.69. Each replicate contained 200 individuals
No penalty  

Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
010  0.020  0.741  0.014  0.44  0.95  0.43 
011  0.005  0.593  0.663  2971.81  0.95  0.21 
110  0.002  0.000  0.971  14514.13  0.96  0.10 
111  0.280  0.713  0.018  0.09  0.79  1.00 
001  0.050  0.012  0.050  0.28  0.91  0.09 
100  0.020  0.038  0.010  33.23  0.95  0.05 
101  0.003  0.000  2.212  7674.81  0.90  0.10 
Ridge penalty  
Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
010  0.020  0.741  0.135  0.73  0.85  0.18 
011  0.005  0.593  0.359  1.91  0.74  0.03 
110  0.002  0.000  0.557  1.00  0.39  0.19 
111  0.280  0.713  0.010  0.11  0.83  1.00 
001  0.050  0.012  0.037  0.24  0.87  0.13 
100  0.020  0.038  0.070  1.08  1.00  0.00 
101  0.003  0.000  0.649  2.16  0.89  0.11 
Difference penalty  
Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
010  0.020  0.741  0.008  0.62  0.98  0.35 
011  0.005  0.593  0.132  1.39  1.00  0.03 
110  0.002  0.000  0.272  1.00  1.00  0.15 
111  0.280  0.713  0.008  0.12  0.89  1.00 
001  0.050  0.012  0.122  0.24  0.79  0.21 
100  0.020  0.038  0.131  0.88  0.93  0.07 
101  0.003  0.000  0.301  1.68  0.91  0.09 
Realized β^{ a }, mean bias, mean standard error (SE), coverage probability and percentage of the effects that had a pvalue < 0.05 in the simulation where haplotypes 001 and 101 had a modeled parameter estimate of 1.10. Each replicate contained 200 individuals.
No penalty  

Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
001  0.050  1.151  0.067  0.25  0.90  0.98 
101  0.003  0.817  0.868  8132.96  0.97  0.25 
010  0.020  0.031  0.079  0.48  0.90  0.10 
011  0.005  0.000  1.363  3514.27  0.92  0.08 
100  0.020  0.025  0.026  0.49  0.94  0.06 
110  0.002  0.000  1.082  13327.13  0.95  0.05 
111  0.280  0.005  0.012  0.10  0.80  0.20 
Ridge penalty  
Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
001  0.050  1.151  0.012  0.24  0.84  0.99 
101  0.003  0.817  0.568  1.77  0.57  0.08 
010  0.020  0.031  0.057  1.56  1.00  0.00 
011  0.005  0.000  0.322  2.91  0.98  0.02 
100  0.020  0.025  0.022  1.65  1.00  0.00 
110  0.002  0.000  0.338  1.85  0.86  0.14 
111  0.280  0.005  0.005  0.11  0.84  0.16 
Difference penalty  
Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
0 0 1  0.050  1.151  0.003  0.25  0.88  0.98 
1 0 1  0.003  0.817  0.394  1.67  0.87  0.01 
0 1 0  0.020  0.031  0.111  1.43  1.00  0.00 
0 1 1  0.005  0.000  0.049  2.48  0.99  0.01 
1 0 0  0.020  0.025  0.078  1.53  1.00  0.00 
1 1 0  0.002  0.000  0.139  2.08  1.00  0.00 
1 1 1  0.280  0.005  0.019  0.12  0.86  0.14 
For both simulations, removal of the rare haplotypes 011, 110 and 101 from the model leads to a small reduction in the bias of the remaining haplotypes (e.g. from 0.048 to 0.038 and from 0.019 to 0.005 for haplotypes 010 and 111, respectively). Depending on the modeled effects of the omitted haplotypes, the bias in the estimate of haplotype 111 decreases (simulation 1: from 0.019 to 0.006) or increases (simulation 2: from 0.005 to 0.011) a little.
Realized β^{ a }, mean bias, mean standard error (SE), coverage probability and percentage of the effects that had a pvalue < 0.05 in the simulation where haplotypes 001 and 101 had a modeled parameter estimate of 1.10. Each replicate contained 2000 individuals.
No penalty  

Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
001  0.050  1.094  0.005  0.08  0.93  1.00 
101  0.003  1.155  0.037  0.38  0.93  0.77 
010  0.020  0.016  0.007  0.14  0.98  0.02 
011  0.005  0.023  0.036  0.30  0.98  0.02 
100  0.020  0.006  0.015  0.14  1.00  0.00 
110  0.002  0.084  0.075  1.14  0.95  0.05 
111  0.280  0.007  0.007  0.03  0.82  0.18 
Ridge penalty  
Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
001  0.050  1.094  0.009  0.07  0.91  1.00 
101  0.003  1.155  0.085  0.61  0.91  0.62 
010  0.020  0.016  0.007  0.013  0.96  0.04 
011  0.005  0.023  0.046  0.23  0.96  0.04 
100  0.020  0.006  0.014  0.12  0.98  0.02 
110  0.002  0.084  0.067  0.40  0.66  0.34 
111  0.280  0.007  0.008  0.03  0.81  0.19 
Difference penalty  
Haplotype  Frequency  Realized^{ a }  Mean bias  Mean SE  Coverage  P < 0.05^{ b } 
0 0 1  0.050  1.094  0.011  0.07  0.88  1.00 
1 0 1  0.003  1.155  0.066  0.54  0.88  0.71 
0 1 0  0.020  0.016  0.004  0.12  0.96  0.04 
0 1 1  0.005  0.023  0.071  0.22  0.90  0.10 
1 0 0  0.020  0.006  0.017  0.12  0.98  0.02 
1 1 0  0.002  0.084  0.069  0.37  0.57  0.43 
1 1 1  0.280  0.007  0.009  0.03  0.82  0.18 
Implementation
In the GENDER study [13] 3146 patients with cardiovascular disease who were treated with percutaneous transluminal coronary angioplasty (PTCA) with or without stents were followed for at least twelve months for the occurrence of clinical restenosis and revascularization of the vessel which was originally treated with PTCA. Inflammatory processes are involved in such target vessel revascularization (TVR), and the level of inflammatory response is controlled by several genes, among which possibly the IL10 gene. We determined in 2653 patients the variants of four SNPs in this gene (IL10 592G > T, IL10 2849G > A, IL10 1082G > A, and IL10 4251A > G), and evaluated their association with TVR risk. Overall, there were 252 TVR, and TVR risk was 9% at nine months, and 10.5% at twelve months. Rare allele frequencies were 28%, 49%, 27%, and 24%, of the four SNPs, respectively. All four markers were in linkage disequilibrium (P < 0.001), and HardyWeinberg equilibrium was not rejected for any of the markers (P > 0.0125; significance level (Bonferroni) corrected for multiple testing).
Univariately, in a Cox model assuming codominant effects, IL10 2849G > A, IL10 1082G > A, and IL10 4251A > G were significantly associated with TVR risk with hazard ratios (HR) 1.21 (95% CI: 1.00–1.46), 1.20 (1.01–1.42), 1.20 (0.99–1.45), respectively. HR of IL10 592C > A was 0.87 (0.70–1.07). In a Cox model with all SNPs, and all twoway interactions, we found significant interactions between SNPs IL10 2849G > A, and IL10 1082G > A (P = 0.003), and IL10 1082G > A, and IL10 4251A > G (P = 0.001). Higherorder interactions were not significant. The prognostic index of this model (Xβ) varied between 2.6 and +1.6, and had 23 different values corresponding to the 23 different genotype combinations that were observed.
Discussion
In the present study we present a method to model the relation between failure time and (rare) haplotypes in unrelated individuals. The simulations presented in this study show that haplotype effects of especially the rare haplotypes are closer to the true estimates when a penalty is introduced into the model.
Furthermore, the simulations show that the penalized loglikelihood approach that is used to deal with the unstable estimates of rare haplotypes can indeed shrink the estimates and their 95% confidence intervals to 'acceptable' values. Simulations also show that the crossvalidated standard errors of the more common haplotypes can be increased compared to their unpenalized standard errors due to uncertainty with respect to the penalty parameter λ. The power to detect a true haplotype effect is, in general, reduced in the penalized models compared to the nonpenalized model, the reduction being more pronounced for the less frequent haplotypes. This reduced power is due to the shrinkage of the estimates. With respect to the type I error probabilities, the effect of introducing a penalty depends on the penalty applied and the true haplotype effects. For the ridge penalized models, the type I error probabilities are similar to those observed in the nonpenalized models. The nonpenalized estimates of the haplotypes without a modeled effect are already close to the true value of zero and the nature of the ridge penalty further shrinks the effects towards zero. For the difference penalty, the type I error probability appears to be increased for some haplotypes. This deviation is related to the extent that the assumption (similar haplotypes, similar effects) is met. In the first simulation (Table 2), the mean bias of haplotypes 001 and 100 are increased in the direction of a β > 0. In this scenario, haplotypes 010, 011, 110 and 111 all had a modeled effect and the difference penalty results in estimates for the haplotypes 001 and 100 towards the effects of these haplotypes. In the second simulation, the majority of the haplotypes had no modeled effect and the effects of the rare haplotypes could be directed towards β = 0. In replicates containing 2000 individuals (Table 4), the reduction in SE is still present, but the gain is relatively small compared to simulation with only 200 individuals per replicate. This is conform expectations, since rare haplotypes are 'less rare' in larger samples, thus enabling a more precise estimation of their effect even without a penalty. Based on the characteristics of the models displayed in the simulations and the real data, the penalized loglikelihood method mostly serves the purpose of estimating the effects of rare haplotypes more accurate.
The method described in this paper is a flexible method allowing for adjustment for (environmental) covariates as well as haplotypeenvironment interactions. Although we focus on haplotypes consisting of a certain number of biallelic SNPs, the method is also capable to handle loci with more than two alleles. Furthermore, the method can be easily extended to deal with missing genotype data, since this will simply increase the number of possible haplotype pairs that are compatible with the observed genotype. The wiq in our method are calculated under the assumption of HardyWeinberg. Although we did not check robustness of the method to violations of this assumption, Lin [10] has shown that his method, which is similar to our unpenalized method, is robust to violations of the HardyWeinberg assumption.
As an alternative estimation method we considered the partial likelihood. Unfortunately, estimation of β and H_{0}(t) are not separated, and therefore there is no reason to prefer this partial likelihood approach over the EM algorithm outlined in the present manuscript. Compared to the method described by Tregouet et al [11], the present EM method assumes piecewise constant hazard, which seems less restrictive than the assumption behind the their method using partial likelihood.
We use a penalty function to increase precision of estimates of rare haplotypes. Other strategies for managing unstable estimates of rare haplotypes include excluding the rare haplotypes from the variable list, pooling the rare haplotypes into one category, or pooling the rare haplotypes with common haplotypes that are very similar. The first approach implicitly groups the rare haplotypes with the reference category and the second and third approach lead to pooled categories that are sometimes hard to interpret. Nevertheless, these last two methods seem to increase power [16]. However, the three strategies mentioned above do not result in (individual) effect estimates of rare haplotypes, whereas the penalized models do.
Conclusion
The method presented in this paper can be applied to estimate haplotype effects in cohort studies when haplotype phase is unknown. The joint estimation of haplotype effects and haplotype frequencies together with the penalty function provides a good way of estimating effects of rare haplotypes, which is a common problem in these studies.
Abbreviations
 SNP(s):

single nucleotide polymorphism(s)
 SE:

standard error
 PTCA:

percutaneous transluminal coronary angioplasty
 TVR:

target vessel revascularization
 IL10:

interleukin 10.
Declarations
Acknowledgements
M.W.T. Tanck was financially supported by the Netherlands Heart Foundation grant no 2000.125.
Authors’ Affiliations
References
 Durrant C, Zondervan KT, Cardon LR, Hunt S, Deloukas P, Morris AP: Linkage disequilibrium mapping via cladistic analysis of singlenucleotide polymorphism haplotypes. Am J Hum Genet. 2004, 75: 3543. 10.1086/422174.PubMed CentralView ArticlePubMedGoogle Scholar
 Epstein MP, Satten GA: Inference on haplotype effects in casecontrol studies using unphased genotype data. Am J Hum Genet. 2003, 73: 13161329. 10.1086/380204.PubMed CentralView ArticlePubMedGoogle Scholar
 Satten GA, Epstein MP: Comparison of prospective and retrospective methods for haplotype inference in casecontrol studies. Genet Epidemiol. 2004, 27: 192201. 10.1002/gepi.20020.View ArticlePubMedGoogle Scholar
 Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA: Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet. 2002, 70: 425434. 10.1086/338688.PubMed CentralView ArticlePubMedGoogle Scholar
 Seltman H, Roeder K, Devlin B: Evolutionarybased association analysis using haplotype data. Genet Epidemiol. 2003, 25: 4858. 10.1002/gepi.10246.View ArticlePubMedGoogle Scholar
 Sham PC, Rijsdijk FV, Knight J, Makoff A, North B, Curtis D: Haplotype association analysis of discrete and continuous traits using mixture of regression models. Behav Genet. 2004, 34: 207214. 10.1023/B:BEGE.0000013734.39266.a3.View ArticlePubMedGoogle Scholar
 Stram DO, Leigh PC, Bretsky P, Freedman M, Hirschhorn JN, Altshuler D, Kolonel LN, Henderson BE, Thomas DC: Modeling and EM estimation of haplotypespecific relative risks from genotype data for a casecontrol study of unrelated individuals. Hum Hered. 2003, 55: 179190. 10.1159/000073202.View ArticlePubMedGoogle Scholar
 Zaykin DV, Westfall PH, Young SS, Karnoub MA, Wagner MJ, Ehm MG: Testing association of statistically inferred haplotypes with discrete and continuous traits in samples of unrelated individuals. Hum Hered. 2002, 53: 7991. 10.1159/000057986.View ArticlePubMedGoogle Scholar
 Zhao LP, Li SS, Khalid N: A method for the assessment of disease associations with singlenucleotide polymorphism haplotypes and environmental variables in casecontrol studies. Am J Hum Genet. 2003, 72: 12311250. 10.1086/375140.PubMed CentralView ArticlePubMedGoogle Scholar
 Lin DY: Haplotypebased association analysis in cohort studies of unrelated individuals. Genet Epidemiol. 2004, 26: 255264. 10.1002/gepi.10317.View ArticlePubMedGoogle Scholar
 Tregouet DA, Tiret L: Cox proportional hazards survival regression in haplotypebased association analysis using the StochasticEM algorithm. Eur J Hum Genet. 2004, 12: 971974. 10.1038/sj.ejhg.5201238.View ArticlePubMedGoogle Scholar
 Tanck MW, Klerkx AH, Jukema JW, Knijff PD, Kastelein JJ, Zwinderman AH: Estimation of multilocus haplotype effects using weighted penalised loglikelihood: analysis of five sequence variations at the cholesteryl ester transfer protein gene locus. Ann Hum Genet. 2003, 67: 175184. 10.1046/j.14691809.2003.00021.x.View ArticlePubMedGoogle Scholar
 Agema WR, Monraats PS, Zwinderman AH, de Winter RJ, Tio RA, Doevendans PA, Waltenberger J, de Maat MP, Frants RR, Atsma DE, van der Laarse A, van der Wall EE, Jukema JW: Current PTCA practice and clinical outcomes in The Netherlands: the real world in the predrugeluting stent era. Eur Heart J. 2004, 25: 11631170. 10.1016/j.ehj.2004.05.006.View ArticlePubMedGoogle Scholar
 Augustin T: An exact corrected loglikelihood function for Cox's proportional hazards model under measurement error and some extensions. Scandinavian Journal of Statistics. 2004, 31: 4350. 10.1111/j.14679469.2004.00371.x.View ArticleGoogle Scholar
 Verweij PJM, van Houwelingen HC: Penalized likelihood in Cox regression. Stat Med. 1994, 13: 24272436. 10.1002/sim.4780132307.View ArticlePubMedGoogle Scholar
 Jannot AS, Essioux L, ClergetDarpoux F: Association in multifactorial traits: how to deal with rare observations?. Hum Hered. 2004, 58: 7381. 10.1159/000083028.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.