A genome-wide linkage analysis of alcoholism on microsatellite and single-nucleotide polymorphism data, using alcohol dependence phenotypes and electroencephalogram measures

The Collaborative Study on the Genetics of Alcoholism (COGA) is a large-scale family study designed to identify genes that affect the risk for alcoholism and alcohol-related phenotypes. We performed genome-wide linkage analyses on the COGA data made available to participants in the Genetic Analysis Workshop 14 (GAW 14). The dataset comprised 1,350 participants from 143 families. The samples were analyzed on three technologies: microsatellites spaced at 10 cM, Affymetrix GeneChip® Human Mapping 10 K Array (HMA10K) and Illumina SNP-based Linkage III Panel. We used ALDX1 and ALDX2, the COGA definitions of alcohol dependence, as well as electrophysiological measures TTTH1 and ECB21 to detect alcoholism susceptibility loci. Many chromosomal regions were found to be significant for each of the phenotypes at a p-value of 0.05. The most significant region for ALDX1 is on chromosome 7, with a maximum LOD score of 2.25 for Affymetrix SNPs, 1.97 for Illumina SNPs, and 1.72 for microsatellites. The same regions on chromosome 7 (96–106 cM) and 10 (149–176 cM) were found to be significant for both ALDX1 and ALDX2. A region on chromosome 7 (112–153 cM) and a region on chromosome 6 (169–185 cM) were identified as the most significant regions for TTTH1 and ECB21, respectively. We also performed linkage analysis on denser maps of markers by combining the SNPs datasets from Affymetrix and Illumina. Adding the microsatellite data to the combined SNP dataset improved the results only marginally. The results indicated that SNPs outperform microsatellites with the densest marker sets performing the best.

(page number not for citation purposes) ALDX1 have provided evidence of linkage on chromosome 1, 2, and 7 [3,4]. Highly heritable electrophysiological variables, such as electroencephalography (EEG) and event-related potentials (ERPs), have been used to identify the genes that affect brain activities related to alcoholism. Data from the Eyes Closed Resting EEG experiment (ECB21) have revealed a strong linkage on chromosome 4 [5,6]. In further studies, the far frontal left side channel measure extracted from the target case of the Visual Oddball experiment for 4 electrode placements (TTTH1) revealed a strong linkage on chromosome 7 [7]. In these studies a 10-cM map of microsatellites was used for the initial scan. The recent addition of singe nucleotide polymorphisms (SNPs) within these linkage regions has improved the resolution of the mapping results [1,7].
The microsatellite-based screening approach has been used successfully for mapping Mendelian diseases. However, this technique has been proven to be unreliable for complex genetic diseases [8,9]. It has been suggested that a 1-2 cM map of moderately polymorphic biallelic markers would be more powerful than a 5-10 cM map of microsatellite screening sets [10]. If that is the case, the recently developed, high-density oligonucleotide arraybased, whole-genome sampling analysis approach [11] should provide an ideal set of genotype data for a whole genome scan.
The COGA dataset provided to participants at GAW14 included data from 1,350 participants from 143 families. The genotype dataset included data produced by a 10-cM map of microsatellites, Affymetrix GeneChip ® Human Mapping 10 K Array (HMA10K), and Illumina SNP-based Linkage III Panel. To identify susceptibility regions for alcoholism, we performed a genome-wide multipoint linkage analysis using alcohol dependence phenotypes ALDX1, ALDX2 (diagnosed by DSM-IV criteria), and quantitative traits TTTH1 and ECB21. The performance of microsatellites, Affymetrix HMA10K Array, and Illumina Linkage III Panel were compared in terms of information content, identified linkage regions and the 1-LOD support interval of the regions.

Map construction
The 10-cM microsatellite maps contained 328 microsatellites of which 309 have unique locations on the deCode high-resolution genetic maps. To map the SNPs, we first obtained the physical locations from build 34 of the human genome dbSNP database at the National Center for Biotechnology Information (NCBI). We then interpolated the genetic map locations using the microsatellite with unique physical locations in deCode genetic maps. 11,050 Affymetrix SNPs and 4,700 Illumina SNPs with unique sex-averaged genetic map locations were used in our study. We also created an even denser map of markers by combining Affymetrix and Illumina SNPs (Comb2). In addition, we combined Affymetrix and Illumina SNPs with the microsatellite data (Comb3) to determine the contribution of microsatellite markers.

Genotype error detection
The datasets were prepared with PEDCHECK [12] to remove Mendelian inconsistencies. A small number of erroneous genotypes can reduce the power of linkage analysis [13][14][15]. We therefore used MERLIN [16] to eliminate the genotypes with unlikely recombination patterns (0.39% for microsatellites, 0.16% for Affymetrix SNPs, 0.13% for Illumina SNPs).

Linkage analysis
Information content (IC) measures how much of the inheritance information can be extracted from available genotype data. It closely predicts the power of a map to detect linkage [10]. We used MERLIN to calculate IC at every marker locus.
Alcohol dependence phenotypes ALDX1 and ALDX2 include five categories: no information; pure unaffected; never drank; unaffected with some symptoms; affected. We treated "never drank" as "no information", and combined "pure unaffected" and "unaffected with some symptoms" as "unaffected." We performed nonparametric linkage (NPL) analysis based on the identity-by-descent (IBD) sharing among affected individuals in a pedigree. We used MERLIN to calculate NPL all [17] and the corresponding nonparametric LOD scores based on the linear model [18]. NPL scores are generally regarded as conservative [17,18]. The nonparametric LOD scores give more accurate p-values and can be used to construct 1-LOD support intervals [9]. For this reason we used the nonparametric LOD scores in our analysis. We chose significance level 0.05 to report linkage regions.
We conducted variance components analyses on the log transformed quantitative traits TTTH1 and ECB21, adjusting for age and sex. Heritability of the traits and the LOD scores at every marker locus were calculated by using MERLIN.

Computational implementation
MERLIN is a software package designed for dense genetic maps in pedigree data. It efficiently implements the Lander-Green algorithm [19] by using sparse binary trees to represent gene flow. We modified the tree structure in the source code of MERLIN and adjusted different compilation options to improve efficiency on highly dense maps and extensive pedigrees by 25% on a 32-bit UNIX machine and 50% on a 64-bit UNIX machine.

Summary of the maps
The summary statistics for the data are shown in Table 1. Among the three data sets, Affymetrix SNPs have the highest and the most uniform density across the genome. Due to the sparseness of the microsatellites, there is only slight difference in density between the combined SNPs and the combined SNPs and microsatellites.

Information content
Microsatellites had the lowest mean and highest standard deviation (SD) of genome-wide IC due to the limited coverage of the genome ( Table 2). With the densest map, Affymetrix SNPs produced the highest mean and lowest SD as well as a lowest inter-quartile range and most narrow range of IC. This indicates a uniform and robust distribution of IC across the genome. The same trend remained for the combined map with microsatellites and SNPs (Comb3). The combined SNPs data (Comb2) show similar IC compared with Comb3 (Table 3). These results show that density plays a key role in extracting inheritance information from the available genotype data.

Linkage analysis with alcohol dependence phenotypes ALDX1 and ALDX2
Both ALDX1 and ALDX2 phenotypes showed significant linkage on chromosome 7 (96-106 cM) and chromosome 10 (149-176 cM) in all the datasets. In the SNP datasets, both phenotypes detected the same region on chromosome X (30-46 cM), although the LOD scores for ALDX2 were much less significant ( Table 4, Table 5).
Our analyses also detected linkage regions unique to each phenotype. For ALDX1, linkage was detected on chromosome 2, 7, 10, and 11 ( Table 4) in all of the data sets. Additional linkage regions on chromosome 1, 6,9,12,13,18, and X were detected in the SNP datasets. Consistent with previous literature [3,4], the most significant linkage region in the combined data (Comb3) was located on chromosome 7 ( Figure 1). In the Affymetrix panel, the highest maximum LOD score was 2.25 at 100.871 cM ( Table 4). The 1-LOD intervals for this linkage region (Affymetrix: 6.551 cM; Illumina: 9.557 cM; microsatellites: 26.80 cM) revealed significant difference between microsatellites and SNPs. An even higher maximum LOD score of 2.52 at 101.049 cM with a narrower 1-LOD interval 3.647 (Figure 1) was achieved with a denser map in the combined datasets (Comb3). On chromosome 1 and 2 we found linkage regions adjacent to the significant regions reported in [3,4] in both SNP datasets. In contrast, we did not find any evidence of linkage to these two regions in the microsatellite data. Analyses using ALDX1 on selected chromosomes of the Comb2 and Comb3 data showed almost identical results.

Linkage analysis with EEG measures TTTH1 and ECB21
The kurtosis values of the log transformed TTTH1 and ECB21 are -0.13 and -0.44, respectively, indicating no sig-   [7], but the heritability is lower. The difference in heritability values could be due to different sample structure or different algorithms (Jones et al. [7] performed the variance components analyses using the t-distribution option of SOLAR) used in these two studies.
The estimate of heritability for ECB21 is 55.54% after adjusting for age and sex. One of the linkage regions on chromosome 4 (58-79 cM: LOD score 1.10 for Affymetrix, 1.25 for Illumina, and 1.40 for microsatellites) over-laps with the highly significant linkage region found in previous studies [5,6]. The most significant region is on chromosome 6 (169-185 cM: LOD score 2.18 for Affymetrix, 2.11 for Illumina, and 0.71 for microsatellites).

Discussion
Based on the COGA data provided to participants at GAW14, we have presented a NPL analysis for alcohol dependence phenotypes ALDX1 and ALDX2, and a variance component analysis for EEG measures TTTH1 and ECB21. Our results confirmed some of the linkage findings in previous studies [3][4][5][6][7]. The increased density of the SNP data extends the number of regions detected and increases the resolution of the linkage results.
In our study, we used a p-value of 0.05 as the significance level to report linkage regions. However, in order to define true linkage and explain the inconsistencies among the results of different datasets, it is important to choose a  level of genome-wide significance. Commonly used resampling-based and gene-drop simulation approaches are computationally intensive and do not lend themselves to the analysis of the large amount of data in this study. We are investigating a more efficient Monte Carlo procedure to assess genome-wide significance in linkage analysis [20].
Our results show that a denser map can be more powerful for linkage analysis. IBD sharing based linkage analysis algorithms usually assume linkage equilibrium between the markers and the strong linkage disequilibrium between closely adjacent markers could potentially introduce false linkage results [21,22]. A study using Affymetrix HMA10K arrays indicated that there was no substantial difference in the results when SNPs in linkage disequilibrium are either retained or removed [15].
This study represents an extensive performance comparison of three different platforms (microsatellite markers, Affymetrix HMA10K Array, and Illumina Linkage III Panel) in a series of linkage analyses for alcoholism. The high density and the robust performance of SNPs make the whole-genome scan a desirable approach for linkage analysis. This new approach may bring a renewed power to IBD sharing based linkage analysis.

Authors' contributions
CZ carried out the statistical analysis and drafted the manuscript. SC assembled the Affymetrix linkage data, conducted qualify control and provided useful comments for the study. GL constructed unique genetic maps for all the Linkage region on chromosome 7