Genetic analysis of the maximum drinks phenotype

Using data provided by the Collaborative Study on the Genetics of Alcoholism we studied the genetics of a quantitative trait: the maximum number of drinks consumed in a 24-hour period. A two-stage method was used. First, linkage analysis was performed, followed by association analysis in regions where linkage was detected. Additionally, the extent of linkage disequilibrium among single-nucleotide polymorphisms (SNP) associated with the phenotype was assessed. Linkage to chromosomes 2 and 7 was detected, and follow-up association analysis found multiple trait-associated SNPs in the chromosome 7 linkage region. Chromosome 4, which has been implicated in previous studies of the maximum drinks phenotype, did not pass our threshold for linkage evidence in stage 1, but secondary analyses of this chromosome indicated modest evidence for both linkage and association. The evidence suggests that chromosome 7 may harbor an additional locus influencing the maximum drinks consumption phenotype.


Background
The data provided by the Collaborative Study on the Genetics of Alcoholism (COGA) for Genetic Analysis Workshop 14 (GAW14) includes the "maximum number of drinks consumed in a 24-hour period." This phenotype is closely related to alcoholism diagnosis, and a previous genome screen of this phenotype in COGA resulted in evidence for linkage to chromosome 4 in the vicinity of the alcohol dehydrogenase (ADH) gene cluster [1]. The GAW14 dataset provides nearly 16,000 genotyped singlenucleotide polymorphisms (SNP) that were not available in the original COGA data. For this report we have analyzed these additional genotypic data to see whether the original linkage findings can be confirmed or extended. Linkage signals were followed up with association analysis to make use of the density of the SNP data to refine linkage signals and potentially localize genetic variants that may influence alcohol consumption.

Methods
In defining the quantitative trait, we set the phenotype of individuals who report a maximum drinks value of zero to be unknown, since these unexposed individuals have undetermined response to alcohol. We reduced skewness by taking the logarithm of the maximum number of drinks. A linear regression was used to correct for sex, and the final trait value was defined as the residual from the regression.
Our primary analysis consisted of two stages, applied to the cleaned SNP data. In the first stage we tested for genetic linkage using the SNP marker data provided by Illumina. Nonparametric two-point quantitative trait linkage analysis was performed using the "--qtl" option of the program MERLIN [1]. The method compares allele sharing among individuals with similar phenotypes; individuals at extreme ends of the distribution are given greater weight. We chose two-point rather than multipoint analysis because strong linkage disequilibrium (LD) among the densely spaced SNP markers could lead to inflated evidence for linkage if multipoint methods are used without accounting for marker-marker LD. Markers exhibiting linkage with a LOD score greater than 1.8 were used to define regions for further study. In the second stage a 20-cM interval centered at the marker was used to test for trait association. SNPs in this interval from the combined Illumina and Affymetrix sets were analyzed for trait association using the quantitative pedigree disequilibrium test (QPDT) [2] as implemented in the program UNPHASED [3]. Our choice to use the Illumina SNP set (4,710 markers) for the first stage and the Affymetrix SNP set (11,120 markers) together with the Illumina set for the follow-up association analysis allowed us to carry out the initial genome screen and then proceed with fine-mapping in regions of linkage using the more densely spaced SNPs provided by Affymetrix.
To supplement our primary two-stage analysis, we carried out some related additional analyses. These are described below.
For comparison with the stage 1 SNP linkage screen, linkage analysis of the autosomal microsatellite data was performed using the same two-point nonparametric analysis as above. We also repeated this microsatellite analysis using a 2-cM multipoint grid because LD among these markers was not a significant concern.
For chromosome 4, prior linkage evidence for the maximum drinks phenotype has been reported [1], making this a chromosome of special interest. Using the Illumina markers for chromosome 4, we then tested for linkage in the first stage using a regression of the estimated IBD status against the squared sum and squared difference of the trait values, as implemented by the program MERLIN-REGRESS [4]. As this analysis method is computationally intensive, we applied it only to chromosome 4, allowing a more comprehensive study of this chromosome of interest.
LD between markers should be considered when using SNP data to search for disease or trait association. Thus, SNPs exhibiting trait association in our primary analysis that were also in close proximity to one another were tested to see if they were in LD with each other. Pair-wise LD coefficients were computed using the program LDMAX, which is part of the GOLD computer package [5]; results are reported using the normalized coefficient D' = D/|D| max , where D = p 11 -p 1 p 2 , p 11 is the frequency for the haplotype containing allele 1 at both markers, p i is the frequency of allele 1 at the i th marker of the pair, and |D| max is the maximum possible (absolute) value of D given the marker allele frequencies.

Results
Out of a sample of 1,614 individuals, the maximum number of drinks was recorded for 1,388 (86%) individuals. The mean, median, and mode were 17.8, 12, and 24, respectively, with a standard deviation of 17.3. Due to computer memory constraints, 9 out of 143 families (6%), consisting of 243/1,614 (15%) individuals, were automatically dropped by MERLIN from the linkage analysis in stage 1, and this reduction in sample is a limitation of our analyses. The average maximum drinks value in the dropped families was slightly lower than the overall average (by 0.16 of a standard deviation); this difference does not appear to have a systematic cause and is not expected to affect the interpretability of the results.
We used 4,710 markers from the Illumina SNP set. The mean spacing between markers was 0.77 cM on the genetic map and 621 kb on the physical map, and the mean minor allele frequency (MAF) was 0.39. We used 11,120 markers from the Affymetrix SNP set. The mean spacing was 0.32 cM (258 kb on the physical map), and the mean MAF was 0.27.
In our primary analysis, only chromosomes 2 and 7 were found to have LOD scores over 1.8 at stage 1. Tables 1 and  2 show the results of the combined two-point linkage and trait association analysis (QPDT) for these two chromosomes. For the stage 2 association analyses, the tables report only those SNPs significant at the 0.01 level. We note that while multiple SNPs throughout the linked regions were significant at the 0.05 level, only one SNP on chromosome 2 and multiple SNPs on chromosome 7 gave association results significant at the 0.01 level. We computed values of D' for the significantly associated SNPs on chromosomes 2 and 7 (Tables 1 and 2). Thus, we focused on chromosome 7 because multiple significant SNPs were detected there. An LD block extended from 124.757 Mb to 124.791 Mb, covering the three most significantly associated SNPs; in fact, this block satisfied |D'| ≥ 0.9 for all marker pairs. Thus, the evidence for association in this region does not come from three independent SNPs but rather indicates a single LD block associated with the phenotype.
Our analyses of the microsatellite data found no evidence for linkage satisfying our LOD threshold of 1. Thus, in a direct comparison of the two-point screens using SNPs versus microsatellites, it is curious that higher LOD scores were detected with the SNPs than with the microsatellites, even though the SNPs are individually less informative. It may be the density of the SNP map that was beneficial and allowed the detection of linkage evidence that was missed by the microsatellite screen. To examine this hypothesis, we compared the locations of the trait-linked SNP markers on chromosomes 2 and 7 to the locations of the microsatellite markers using integrated map information from the National Center for Biotechnology Information (NCBI) [6], specifically, the NCBI build 35.1 reference assembly. This assembly places the peak SNP on chromosome 7, rs322812, at 127.338 Mb. This falls near the midpoint of the flanking microsatellites, which are located at 122.205 Mb and 131.736 Mb of the same build, indicating a modest gap, which is also borne out by the genetic positions of these microsatellites at 146.7 cM and 156.2 cM on the GAW14 map. However, for some of the other trait-linked SNPs, there do appear to  be microsatellites positioned relatively close by. Other possible explanations for the higher LOD scores in the SNPs could be higher genotyping success rates in the SNPs or greater genotyping error in the microsatellites. The discrepancy could also be due to chance, for example if less informative SNPs resulted in some families being uninformative for linkage who might otherwise have contributed evidence against linkage.
Note also that in the microsatellite screens, the two-point analysis in many instances found higher LOD scores than the multipoint analysis. For example, the two-point analysis found weak evidence on chromosome 7 with a LOD score over 1 at 116.6 cM (in the vicinity of the SNP linkage signals), while multipoint LOD scores did not reach 1 on chromosome 7 (the maximum LOD was 0.89 at 140 cM).

Conclusion
Our two-stage genome-wide analysis implicates chromosomes 2 and 7 as potentially harboring loci influencing the maximum drinks consumption phenotype. Follow-up association analysis supports chromosome 7 more strongly than chromosome 2, and suggests that further fine-mapping efforts on chromosome 7 may be constructive. Interestingly, this region of chromosome 7q contains several TAS2R bitter taste receptor genes [7], and haplotype status at one of these, TAS2R38, has been shown to be a significant predictor of alcohol intake [8]. The muscarinic receptor CHRM2 gene associated with alcohol dependence and major depressive syndrome in COGA [9] is also in this region.
Chromosome 4, which has been implicated in a previous linkage study of maximum drinks, did not pass our threshold for linkage evidence in the first stage of our primary analysis, but additional analyses of this chromosome indicated modest evidence for both linkage and association. In Saccone et al. [1]  LD between markers is a concern when using multipoint linkage analysis, and our results show there is strong LD among SNPs on chromosome 7 which were found to be associated with maximum drinks. As an additional analysis we studied global LD structure using an algorithm we have previously described [10] to produce a non-overlapping set of blocks across all the chromosomes in the Illumina and Affymetrix SNP sets and found a significant amount of LD and variable block coverage on the different chromosomes, as expected. We are developing techniques for selecting "tag" SNPs from LD blocks which should be useful in this context by allowing maps to be thinned for multipoint linkage analysis, or by targeting "tag" SNPs for priority genotyping.

Authors' contributions
SFS performed all statistical analyses and is the author of the program used to find LD blocks. All authors assisted in the design of the study and interpretation of results. In particular, NLS and JPR assisted in the design of the LD block program. SFS and NLS drafted the manuscript with input from all authors, and all authors read and approved the final manuscript.