Multilocus and interaction-based genome scan for alcoholism risk factors in Caucasian Americans: the COGA study

In this paper, we applied the nonparametric linkage regression approach to the Caucasian genome scan data from the Collaborative Study on the Genetics of Alcoholism to search for regions of the genome that exhibit evidence for linkage to putative alcoholism-predisposing genes. The multipoint single-locus model identified four regions of the genome with LOD scores greater than one. These regions were on 7p near D7S1790 (LOD = 1.31), two regions on 7q near D7S1870 (LOD = 1.15) and D7S1799 (LOD = 1.13) and 21q near D21S1440 and D21S1446 (LOD = 1.78). Jointly modeling these loci provided stronger evidence for linkage in each of these regions (LOD = 1.58 on 7q11, LOD = 1.61 on 11q23, and LOD = 1.95 on 21q22). The evidence for linkage tended to increase among pedigrees with earlier mean age of onset at 8q23 (p = 0.0016), 14q21 (p = 0.0079), and 18p12 (p = 0.0021) and with later mean age of onset at 4q35 (p = 0.0067) and 9p22 (p = 0.0008).


Background
The Collaborative Study on the Genetics of Alcoholism (COGA) is a study designed to identify the genetic influences of alcoholism. Although alcoholism itself and the corresponding risk factors are heritable, they are strongly believed to be complex genetic traits. Thus, in the search for genes that influence these traits we expect significant genetic heterogeneity, gene × gene, and gene × environment interactions. Statistical methods that have the flexibility to simultaneously consider multiple loci and environmental factors are potentially valuable tools in the search for putative disease-predisposing loci. The purpose of this paper is to examine the evidence for linkage using multilocus nonparametric linkage regression modeling and explore whether the evidence for linkage varies by the age of onset of alcoholism [1,2].

Methods
The genotyped sample provided by COGA to the Genetics Analysis Workshop 14 consists of 102 Caucasian pedigrees (1,078 individuals) and 41 non-Caucasian pedigrees (526 individuals). Given the limited number of pedigrees of non-Caucasian ethnicity, this paper focuses on the self-reported Caucasian sample genotyped on 315 microsatellite markers and 15,406 autosomal singlenucleotide polymorphisms (SNPs). The alcohol dependence diagnosis required that an individual have DSM-III-R alcohol dependence and Feighner alc definite. This yielded four affection status classifications: 1) unaffected, 2) never drank, 3) unaffected with some symptoms, and 4) affected. The primary focus of these analyses will use affection status 4 only.
The initial genome scan linkage analyses were computed using the nonparametric linkage (NPL) (pairs) and NPL (all) statistics under 1) the exponential allele-sharing  [3] and 2) a conditional logistic regression parameterization denoted NPL regression [1,2]. This regression-based approach provides a one degree of freedom test of the evidence for linkage conditional on the evidence for linkage at the other loci in the model. Model building was performed using step-wise regression techniques. To test for an interaction between two loci, we included the two loci and their statistical interaction in the model and computed the one degree of freedom test of the interaction coefficient. In addition, we tested for interactions between the degree of sharing (identity by descent (IBD)) at a locus and the pedigree-specific mean age at alcoholism diagnosis. The pvalue should be interpreted as a point-wise p-value and was not adjusted for the number of comparisons across the genome. All analyses are based on multipoint IBD estimates.
Ordered subset analyses (OSAs) [4] were computed to investigate the influence of a pedigree's mean age at alcoholism diagnosis on the evidence for linkage. Analyses were conducted ranking the mean family age of onset in ascending, and then in descending order. Linkage analyses were computed on contiguous subsets of pedigrees based on the mean age of onset ranking. The statistical significance of the change in the LOD score was evaluated by a permutation test under the null hypothesis that the ranking of the covariate is independent of the LOD score of the family on the target chromosome. Thus, the families were randomly permuted with respect to the covariate ranking and an analysis proceeded as above for each permutation of these data. The resulting empirical distribution of the change in the LOD score yielded a chromosome-wide p-value (∆p). MERLIN [5] was also used to perform a genome scan and was subsequently NPL single-locus and multiple locus regression used on the SNP data. Due to computation time only the chromosomes that showed linkage with the microsatellites were run through MERLIN for linkage analysis. Cox et al. [6] examined the decay of linkage disequilibrium (LD) across the genome and found little evidence that adjacent markers exhibited significant LD, thus validating the use of the SNP data for linkage analysis. More specifically, in the absence of parental genotype data LD between markers can inflate the type 1 error rate in linkage analysis. The allele frequencies in the MERLIN analyses were computed in MERLIN using founders. As above, multipoint IBD estimates were computed and the NPL regression analysis was computed based on the NPL (pairs) and NPL (all) statistics.

Single-locus models
Two chromosomes had maximum LOD scores greater than 1.0. Ordered by the magnitude of the LOD score, these regions were chromosome 21q22 (LOD = 1.78, 58 cM near D21S1440 and D21S1446), 7p21 (LOD = 1.31, 17 cM near D7S1790), 7q11 (LOD = 1.15, 112 cM nearest D7S1870), and 7q22 (LOD = 1.13, 145 cM near D7S1799). Chromosome 11q23 near D11S1998 (120 cM) provided modest evidence for linkage in the singlelocus model (LOD = 0.81). Figure 1 displays the LOD score curves for the single-locus and multilocus analyses of chromosomes 7, 11, and 21. The linkage analysis of the SNP data using the NPL (pairs) statistic from MERLIN and NPL regression continued to provide evidence for linkage on chromosome 7p21 (LOD = 1.78) and 7q21 (LOD = 1.51) regions. However, the wide gap between microsatel-lite markers on chromosome 21 versus the high number of SNPs in this region led to differing results. Specifically, chromosome 21 no longer provided evidence for linkage to alcoholism. The information content for chromosome 21 went from an average across the chromosome of 60% to 94%. Chromosome 7 also had a marked increase in information content when moving from the SNP data to the microsatellite data, though the peaks remained comparably significant for this chromosome.

Multilocus models
The results of the multilocus NPL regression model building using the microsatellite data are summarized in Table  1. The three peaks associated with chromosome 7q, 11q, and 21q continued to provide evidence for linkage. For all three positions in the model the conditional LOD score was larger and the interval of interest, as defined by the LOD-1 interval, was smaller than the corresponding single-locus LOD score and LOD-1 interval. Figure 1 displays the LOD score curves for each of these models. Interestingly, after adjusting for the evidence of linkage at chromosome 7 at 112 cM, the remaining regions of chromosome 7 no longer exhibited statistically significant evidence for linkage. There was not strong evidence of an interaction among these loci. Table 2 summarizes the results of the NPL regression interaction analysis with age of onset of alcoholism. The evidence for linkage tended to be stronger among pedigrees with an earlier mean age of onset of alcoholism at 8q23 (p = 0.0016), 14q21 (p = 0.0079), and 18q12 (p = 0.0021). Conversely, the evidence for linkage tended to be greater among those pedigrees with later age of onset of alcoholism at 4q35 (p = 0.0067) and 9p22 (p = 0.0008). The difference in mean age of onset tended to be about two to three years among those pedigrees that linked to these regions versus those that did not link ( Table 2). None of the positions identified in the initial genome scan are among those regions that exhibited linkage evidence that varied by age of onset, and the positions iden- tified in the interaction analysis with age of onset did not show significant evidence for linkage in the initial genome scan. The only possible exception was at 7q21 where pedigrees that had a greater mean age of onset tended to have increased evidence for linkage (p = 0.0350). A similar result was also found with OSA, subsetting on the families with the later age of onset of alcoholism increased the LOD score on 7q21 (LOD = 2.62 ∆p = 0.11). Subsetting on later age of onset yielded a significant increase in the LOD score on 17q23 (LOD = 2.01, ∆p = 0.0409). Subsetting on earlier mean age of onset only significantly increased the evidence for linkage at 9q33 (LOD = 1.72, ∆p = 0.0409). The result on 7p21 was within 10 cM of a similar result for earlier age of onset of alcoholism found with the NPL regression interaction analysis (p = 0.02586).

Conclusion
Upon reviewing two previously published genome scans of alcohol dependence, two of our regions were identified in these published studies. The strongest evidence of linkage when looking across both the microsatellites and SNPs in this set of COGA data was on chromosome 7. The region on 7p was also identified in an American Indian population [8] within 10 cM of our peak with a nominal regression p-value of 0.009. The 7q peak is a series of peaks from about 100 cM to 160 cM; this region was found in the original COGA analyses [7] as well. The 11q peak was also replicated in the American Indian population [2] within 10 cM of our peak with a nominal regression p-value of 0.02. The 21q22 result showed up in both the ASM analysis and the NPL regression analysis, but completely disappeared when the SNP analysis was done.
Alcoholism is a genetically complex disease and therefore requires sophisticated consideration of multigenic and phenotypic influences. In this study methods that consider genetic heterogeneity, gene × gene interactions, gene × age-of-onset interactions, and joint modeling of multiple loci increased the evidence for linkage at three chromosomal locations, two of which had been previously identified as being associated with alcohol dependence. These methods reduced the linkage support interval at all three loci. In addition, testing for a dependence of the evidence for linkage on age of onset identified five additional regions of interest. These results suggest the potential utility of incorporating characteristics of complex genetic traits in the analysis.