Including endophenotypes as covariates in variance component heritability and linkage analysis

The purpose of these analyses was to determine if incorporating or adjusting for covariates in genetic analyses helped or hindered in genetic analyses, specifically heritability and linkage analyses. To study this question, two types of covariate models were used in the simulated Genetic Analysis Workshop 14 dataset in which the true gene locations are known. All four populations of one replicate were combined for the analyses. The first model included typical covariates of sex and cohort (population) and the second included the typical covariates and also those related endophenotypes that are thought to be associated with the trait (phenotypes A, B, C, D, E, F, G, H, I, J, K, and L). A final best fit model produced in the heritability analyses was used for linkage. Linkage for disease genes D1, D3, and D4 were localized using models with and without the covariates. The use of inclusion of covariates did not appear to have any consistent advantage or disadvantage for the different phenotypes in regards to gene localization or false positive rate.


Background
The analyses of complex traits can be complex. Often the phenotype itself is not defined or well measured, and voluminous information is collected that relates to the trait. While this increased phenotyping can be helpful in analyses, there still remains a question as to the best methods for incorporating additional information into the genetic analyses: "How do we analyze all the data together?" The Genetic Analysis Workshop 14 (GAW14) dataset was simulated to reflect realistic issues in study design and data collection. Data were gathered from several different research groups using different ascertainment schemes and different affection criteria. The data therefore are heterogeneous, reflecting reality. The purpose of these analyses was to determine if the inclusion of typical covariates (sex and population or cohort) and endophenotype traits (phenotypes A, B, C, D, E, F, G, H, I, J, K, and L) improved the genetic analyses of Kofendrerd Personality Disorder (KPD) and the 12 endophenotypes.

Methods
To imitate a realistic situation, only one replicate (23) of the simulated GAW14 dataset was used. All families from all four populations were included, even though each population had different ascertainment schemes. The affection status for KPD and all phenotypes (A, B, C, D, E, F, G, H, I, J, K, and L) were studied. Each phenotype was analyzed with 3 types of variance component models: 1) without any covariates included in the model, 2) with sex and population (if they were significant, otherwise this model was not performed), 3) with significant variables of sex, population, and the other 12 endophenotype traits. Each covariate was tested independently for significance, and only significant covariates (p < 0.05) were included in the final model.
Heritability and linkage analyses were performed using variance component analyses or random effects models [1][2][3][4] as implemented in the computer program SOLAR [4]. The variance component method [2] decomposes the phenotypic variation (Ω) into measured (candidate gene) genetic effects (Πσ m 2 ), unmeasured genetic effects (2φσ g 2 ), and other effects (Iσ e 2 ). Ω = Πσ m 2 + 2φσ g 2 + Iσ e 2 , where σ m 2 is the additive genetic variance due to the major locus, and Π is a matrix of elements that provide the probability that individuals i and j are identical-by-descent (IBD) at a trait locus that is linked to a genetic marker locus. Π is a function of the estimated IBD matrix of the genetic marker itself and a matrix of the correlations between the proportion of genes IBD at the marker and at the trait. σ g 2 is the genetic variance due to residual additive genetic factors, φ is the kinship matrix, σ e 2 is the variance due to individual-specific environmental effects, and I is an identity matrix. The dichotomous variables were analyzed modeling the discrete affection status trait as a threshold model [5], whereas the latent liability is assumed to have an underlying multivariate normal distribution. Covariates can be added to the model and their effects are estimated simultaneously with the variance components by maximum likelihood techniques. Likelihood ratio tests were performed to test for heritability and locus effects, where the likelihood of the model is compared to a restricted model with no linkage. Twice the difference in log likelihood of the variance component models yields a test statistic that is asymptotically distrib-uted as a 1/2:1/2 mixture of a χ 2 variable and a point mass at zero. Two point linkage analyses were performed using all of the genome scan markers. LOD scores > 3.3 were considered significant for linkage from the genome scans [6].

Results and Discussion
The inclusion of the endophenotypes as covariates into the variance component models influenced some of the heritability estimates and linkage analyses.

Heritability
Not all of the phenotypic traits were heritable (Table 1). KPD was not heritable when analyzed without covariates, or when analyzed with just sex and population as covariates. When the associated phenotypes were tested, seven of the 12 were significant. When these significant phenotypes were included in the model KPD heritability went to 1.0 and became very significant (p < 0.000001). It appears that there may be a significant genetic component to KPD that is not explained by the associated phenotypes, however the heritability parameter is at a boundary that may indicate instability.
Five of the phenotypes appear to have a significant genetic component either with or without covariates: phenotypes A, B, D, K, and L. Interestingly, phenotypes C and G appeared heritable when analyzed without any covariates, but the heritability is no longer significant when the associated phenotypes are included in the model. Surprisingly, population was not as much of a confounder as expected; it was only significant for phenotype D. Pop- ulation was associated with KPD when only population and sex were included in the model. Sex was significant for KPD, phenotype I, and phenotype L.

Linkage
The inclusion of covariates in the linkage analyses did not produce consistent results among the phenotype variables as to the localization of genes (Tables 2 and 3). Diseases genes D1, D3, and D4 would have been localized in both scenarios, and gave no evidence at the other locations.
Forty loci were significant for KPD with the inclusion of covariates, however none of them was the closest to the disease gene. These markers with Disease genes D1, D3, and D4 for phenotypes A, B, and K were localized with or without the inclusion of covariates. Genes were only localized for Phenotypes C, D, G, and L with models that did not include the covariates.
Heritability was no longer significant for phenotype C or G when covariates were in the model. The use of endophenotypes for covariates may have regressed out the major gene effects. Therefore, any genes revealed in the analyses without the covariates should reveal shared effects. Phenotypes C and G showed linkage to D3 and D4, genes identified in analyses of other phenotypes.
Several false positives were found in the search for disease genes, including a false-positive region quite distal to any disease genes. The inclusion of covariates did not consistently alter the false-positive rates.

Conclusion
Simulated disease genes D1, D3, and D4 were localized by variance component linkage analyses to the endophenotypes of the simulated KPD trait. Adjusting models for