Genome scan linkage analysis comparing microsatellites and single-nucleotide polymorphisms markers for two measures of alcoholism in chromosomes 1, 4, and 7

Background We analyzed 143 pedigrees (364 nuclear families) in the Collaborative Study on the Genetics of Alcoholism (COGA) data provided to the participants in the Genetic Analysis Workshop 14 (GAW14) with the goal of comparing results obtained from genome linkage analysis using microsatellite and with results obtained using SNP markers for two measures of alcoholism (maximum number of drinks -MAXDRINK and an electrophysiological measure from EEG -TTTH1). First, we constructed haplotype blocks by using the entire set of single-nucleotide polymorphisms (SNP) in chromosomes 1, 4, and 7. These chromosomes have shown linkage signals for MAXDRINK or EEG-TTTH1 in previous reports. Second, we randomly selected one, two, three, four, and five SNPs from each block (referred to as Rep1 – Rep5, respectively) to conduct linkage analysis using variance component approach. Finally, results of all SNP analyses were compared with those obtained using microsatellite markers. Results The LOD scores obtained from SNPs were slightly higher but the curves were not radically different from those obtained from microsatellite analyses. The peaks of linkage regions from SNP sets were slightly shifted to the left when compared to those from microsatellite markers. The reduced sets of SNPs provide signals in the same linkage regions but with a smaller LOD score suggesting a significant impact of the decrease in information content on linkage results. The widths of 1 LOD support interval of linkage regions from SNP sets were smaller when compared to those of microsatellite markers. However, two linkage regions obtained from the microsatellite linkage analysis on chromosome 7 for LOG of TTTH1 were not detected in the SNP based analyses. Conclusion The linkage results from SNPs showed narrower linkage regions and slightly higher LOD scores when compared to those of microsatellite markers. The different builds of the genetic maps used in microsatellite and SNPs markers or/and errors in genotyping may account for the microsatellite linkage signals on chromosome 7 that were not identified using SNPs. Also, unresolved map issues between SNPs and microsatellite markers may be partly responsible for the shifted linkage peaks when comparing the two types of markers.


Background
The identification of chromosomal segments showing association or linkage is only the first step toward discovery of genetic factors underlying susceptibility to disease. The typical genome-wide linkage analysis based on microsatellites with an average density of 10 cM results in large genomic regions for fine-mapping. In this regard, there is considerable interest in developing maps based on genomic markers that will lead to higher resolution linkage results with the hope of reducing future cost and time to conduct fine-mapping. With the availability of several million new SNPs in the public database and new technologies for large-scale, high throughput SNP genotyping at affordable costs, there is growing interests in using SNPs to create high resolution linkage maps. In this paper we evaluate strategies to systematically compare genomewide linkage results from microsatellite and SNPs using different density maps.

Materials
The dataset for the Collaborative Study on the Genetics of Alcoholism (COGA) was provided as problem 1 for GAW14. The dataset included 1,350 individuals in 143 pedigrees, 318 microsatellite genotypes for a 10 cM genome map, 4,763 SNP loci from Illumina, 11,555 SNP loci from Affymetrix and phenotypic information. We used MAXDRINK and TTTH1 as phenotypes and the panel of 4,763 Illumina SNPs. MAXDRINK is defined as maximum number of drinks in a 24-hour period [1] and TTTH1 is defined as the Visual Oddball Experiment and the Eyes Closed Resting EEG dataset for frontal left side channel. The extracted measures correspond to the 'late' time window, which is set at 300 to 700 ms following stimulus presentation (bounding the visual P3 event), and the theta band power (3 to 7 Hz) [2]. These phenotypes were log transformed for all analyses. Three chromosomes (1, 4, and 7) which show linkage signals for MAXDRINK or TTTH1 phenotypes in previous reports [1,2] were selected for our analyses.

Statistical analysis
For each chromosome, we constructed haplotypes using GENEHUNTER2 (GH2) [3]. Linkage equilibrium among markers is assumed in GH2. As discussed by Shaid D.J. et al. [4], if closely spaced markers are useful for haplotype fine mapping, it is reasonable to assume that that the markers themselves are in linkage disequilibrium (LD), because the implicit basis of fine mapping by haplotypes is LD. Haplotype blocks were generated using the statistical framework method [5], in which the inference on the optimal haplotype block partitioning is formulated as the problem of statistical model selection based on the likelihood of the observed data to define regions with a very small proportion of comparisons among informative SNP pairs showing strong evidence of historical recombination. We selected SNPs, at random, from each block to test for the minimum number of SNPs required to achieve the same results as using all the SNPs in a block. Rep1 represents the process of randomly selecting one SNP from a block and Rep2 for randomly selecting 2 SNPs from a block; this process was repeated until we selected the maximum of 5 SNP (Rep5) from each block. We stoped at five because the minimum observed number of SNPs in observed blocks was 5. We also conducted linkage analysis using all available SNPs. A variance components approach as implemented in SOLAR was used for all analyses [6]. The linkage results using microsatellites markers were then compared to those from constructed haplotype blocks and for reduced number of SNPs from each block (Rep1 through Rep5) and entire set of SNPs. The range of positional candidate regions was defined by a logarithm of odds (LOD) score of ≥ 1.0.

Results
The residual kurtosis of LOG transformed MAXDRINK and TTTH1 are -0.18 and 0.57, respectively allowing the assumption of normality in our analyses. The distribution of haplotype blocks for chromosome 1, 4, and 7 are displayed in Table 1. Although the LOD scores from the linkage analyses based on SNPs, as compared with microsatellites, were consistently larger (p < 0.01), the location of the signals were for the most part similar (Figures 1 and 2). Interestingly, two linkage regions on chromosome 7 (154 cM and 163 cM) were not detected in the SNP analyses for the TTTH1 phenotype (Table 2 and  Linkage plot for the log transformed TTTH1 phenotype in chromosome 7 Figure 1 Linkage plot for the log transformed TTTH1 phenotype in chromosome 7. LOD scores for microsatellite (solid line) and SNPs (dashed lines). The different colored dashed lines represent the results of the linkage analyses based on randomly selected one, two, three, four, and five SNPs from each haplotype blocks and using the entire set of SNPs. MAXDRINK as the phenotype (Table 2). Table 4 shows the widths and boundaries of linkage regions in chromosome 7 for LOG TTTH1 and chromosome 1 for LOG MAXDRINK. Width of linkage regions for LOG TTTH1 was 58 cM from microsatellite markers, compared with 24 cM, 40 cM, 34 cM, 3 8 cM, 30 cM, and 33 cM, respectively, from Rep1 to Rep5 and the entire set of SNPs.

Discussion
In all, the patterns of linkage results from microsatellites were similar to those obtained from SNPs analyses for chromosome 1, 4, and 7. It was however notable that the SNP analyses did not detect two linkage regions on chromosome 7 (LOD = 1.87 and 2.01; Table 2). As displayed in Figures 1 and 2, the LOD score peaks generated from SNPs were slightly shifted to the left when compared to that from microsatellite markers. A potential reason for this observation may be the different builds of the genetic maps used for the microsatellite markers and SNPs, and/ or errors in genotyping [7]. Kruglyak [8] observed an increase in LOD scores for a proportionate increase in the information content of linkage map as derived from a denser SNP map. In our results, reducing the number of SNPs in each block to 1, 2, 3, 4, and 5 SNPs did not significantly change the shape of linkage signals albeit a small drop in peak height. Since expected LOD scores correlate with information content, from table 3, there is only a small reduction in information contents for Rep1 and others are the same. It has been estimated that 1.7-2.5 SNP markers provide equivalent information as one microsatellite marker [8,9] and that a 10 K SNP array provides at least equal power to detect linkage compared with a search based upon a 5 Mb microsatellite screen [10]. In our results, 2.5 SNP markers provide equal information content as one microsatellite marker. These observations support the idea that the use of high dense SNP maps for performing linkage analysis should result in more precisely defined loci at substantially reduced cost.

Conclusion
The linkage results from SNP maps can result in narrower linkage regions with higher LOD scores when compared with microsatellite marker maps. The linkage results from reduced sets of SNPs provided signals in the same linkage regions but with a smaller LOD scores, suggesting that loss of information content influenced expected LOD scores. The different builds of the genetic maps used in microsatellite markers and SNPs or/and errors in genotyping may have led to the significant linkage region observed on chromosome 7 in the microsatellite scan that was not detected in the genome scan based on SNPs, and for peaks from SNPs being slightly shifted to the left of the microsatellite peaks.