A comparison in association and linkage genome-wide scans for alcoholism susceptibility genes using single-nucleotide polymorphisms

We conducted genome-wide linkage scans using both microsatellite and single-nucleotide polymorphism (SNP) markers. Regions showing the strongest evidence of linkage to alcoholism susceptibility genes were identified. Haplotype analyses using a sliding-window approach for SNPs in these regions were performed. In addition, we performed a genome-wide association scan using SNP data. SNPs in these regions with evidence of association (P ≦ 0.0001) were identified. We found that the general patterns for nonparametric linkage (NPL) scores from SNP and microsatellite genome scans are fairly consistent; however, the peaks of the NPL scores are mostly higher in the SNP-based scan than those using microsatellite markers, which might be located at different regions. Furthermore, SNPs identified from linkage screens were not so strongly associated with alcoholism (the most significant SNP had a p-value of 0.030) as those identified from association genomic screening (the most significant SNP had a p-value of 2.0 × 10-8).


Background
Genome-wide linkage scans are typically conducted to narrow down regions prior to association fine mapping. However, Risch and Merikangas [1] claim that linkage analysis has limited power to detect genes of modest effect, and that an association approach utilizing candidate genes has far greater power, even if one needs to examine every gene in the genome. The availability of large-scale, high-throughput genotyping has made the direct genome-wide SNP-based association studies plausible. Recently, John et al. [2] compared the utility of SNPs for linkage analysis with microsatellites. They demonstrated that dense SNP data revealed linkage signals that were not detected in a low-resolution microsatellite scan. They found that the variation in information content was the main factor contributing to observed differences in the two scans based on single-nucleotide polymorphisms (SNPs) and microsatellites, and that the presence of linkage disequilibrium (LD) between a proportion of markers did not significantly affect the analysis. However, Schaid et al. [3] showed that the presence of LD among SNPs can lead to inflated LOD scores when using current geneticlinkage software under the assumption of linkage equilibrium. Similarly, they also identified more linkage peaks with narrower widths by SNPs than microsatellite markers after excluding SNPs with high LD. Despite a few recent attempts to use SNPs in genome-wide scans, a comparison of association versus linkage analyses remains limited. Therefore, the objectives of the present study were to examine the utility of SNPs in linkage analysis when compare with that of the microsatellites markers, and to investigate the value of SNP markers in linkage and association analyses.

Materials
A total of 143 pedigrees (or 364 nuclear families) comprising 1,614 subjects (643 individuals with alcoholism) were analyzed. There were 328 microsatellite markers and 11,120 Affymetrix SNP markers available for analysis. To test for Hardy-Weinberg equilibrium (HWE), one subject from each pedigree was randomly sampled and a chisquare goodness-of-fit test was performed using PROC ALLELE procedure in SAS/GENETICS package. Four hundred and thirty-one SNPs and 69 microsatellites were excluded as a result of departure from HWE. To avoid potential bias caused by rare alleles, 192 SNPs with minor allele frequencies less than 0.02 were further excluded. In addition, to reduce the impact of LD on our linkage results, we computed the pairwise LD measure |D'| sequentially using FBAT computing package [4]. For any two consecutive SNPs with |D'| >0.7, only the one with higher information content (heterozygosity) was included in the analyses (3,169 additional SNPs were then excluded). As a result, only 7,328 out of 11,120 SNPs were included in linkage analysis. For the association scan, 10,187 SNPs were used in the analysis, after excluding 431 SNPs for departure from HWE, 192 SNPs with minor allele frequencies less than 0.02, and 310 SNPs on chromosome X. The phenotype used was alcoholism defined by DSM-III-R alcohol dependence and Feighner's phenotype "Alc Definite" [5].

Linkage and association analyses
Genome-wide microsatelite or SNP linkage screens were conducted using GENEHUNTER 2.1 [6]; linkage evidence was assessed on the basis of NPL scores. Due to the limitation of maximum numbers of markers in GENE-HUNTER, linkage analyses were performed for every 50 SNPs. The whole-genome association scan and multi-SNP haplotype analysis were performed using family-based association tests implemented in the FBAT computing package [4], which uses nuclear families (missing parents are allowed) to test the composite null hypothesis of no association and no linkage. A region with NPL scores greater than 3.0 was identified from the genome-wide linkage scan for haplotype analysis, aiming to test the null hypothesis of no association in the presence of linkage. A sliding-window approach [7] was employed when conducting haplotype analysis on the SNPs identified from the genome-wide linkage scan.

Linkage analysis using SNPs
The average information content from 7,328 SNPs after excluding 3,169 SNPs was almost identical to the original 11,120 SNP markers. The peak NPL scores on individual chromosomes dropped slightly on most chromosomal regions compared to those using all the markers. For example, the peak NPL scores dropped from 3.81 to 3.72 on chromosome 2, from 2.86 to 2.59 on chromosome 4, from 3.76 to 3.08 on chromosome 10, and from 2.94 to 2.44 on chromosomes 11, respectively (Table 1). There were exceptions: the peak NPL scores rose from 1.88 to 2.13 on chromosome 3, and from 1.46 to 1.74 on chromosome 20. Most of the peaks remained located at the Genome-wide scans using microsatellite and SNP markers Figure 1 Genome-wide scans using microsatellite and SNP markers. NPL scores for microsatellite markers (solid line) and SNP markers (dotted line).    same regions, except for the peaks on chromosomes 1 and 6. Because the exclusion of markers did not reduce much of information content in the markers, the reduction of NPL scores could possibly be due to the violation of HWE and LD assumptions from the excluded SNPs [3].

Comparisons of SNP and microsatellite markers
The overall patterns of NPL scores curves derived from microsatellites and SNP markers were fairly consistent ( Figure 1). The regions identified by both types of markers (i.e., when NPL score peaks for microsatellite fell within 1-LOD support intervals constructed by SNPs) were on chromosomes 2, 6, 7, 9, 11, 13, and 15 (Table 2). Among these regions, only the signals (defined by peak NPL scores of at least 1.5) on chromosomes 2, 6, 7, and 11 were picked up by scans of both types of markers. The peaks appearing in the SNPs scan were mostly higher than those in microsatellites. For example, the corresponding peak NPL scores on chromosomes 2, 6, 7, and 11 were 2.24, 1.56, 2.22, and 2.20 for microsatellites and were 3.72, 2.03, 2.81, and 2.44 for SNPs (Table 2). Other linkage regions identified by SNP markers were mostly not found by microsatellites. This could be due to the fact that the overall average information content for SNPs is higher than that for microsatellite by 17% (0.91 versus 0.74; Table 2). Nevertheless, on chromosomes 21, where the information content remained higher for SNPs, the peak NPL scores was lower when compared to that for microsatellites. It is worth noting that the 1-LOD support inter-vals constructed by SNPs were narrower than those constructed by microsatellites.

Association analysis
The 325 th -344 th SNPs (tsc1155229...tsc0540301) on chromosome 2 and the 591 st -600 th SNPs (tsc0549932...tsc0517919) on chromosome 10 with NPL scores greater than 3.0 (p < 0.0017) were selected for haplotype analysis at sliding-window sizes from 1 to 6 (results not shown). The haplotypes with an overall significance level less than a nominal level of 0.05 were constructed by the SNP of tsc1278942 (p = 0.024), and the interval of five SNPs (tsc1155229, tsc0781059, tsc0050143, tsc0159931, and tsc1278942) (p = 0.04) on chromosome 2. None of the haplotypes on chromosome 10 with an overall significance level less than 0.05 were observed. The most significant single haplotype was found to be "1 1 1 1 2 1" constructed by six SNPs (tsc0273475, tsc0336150, tsc0888957, tsc1346599, tsc1346603, tsc0574295) on chromosome 2, with a p-value of 0.0044. On the contrary, 15 markers across the genome have significance levels less than a nominal level of 0.0001 when testing for the null hypothesis of no association and no linkage (Table 3). Among them, the significance levels for tsc0515272 on chromosome 3, tsc0029429 on chromosome 9 and tsc1750530 on chromosome 16 were 3.8 × 10 -7 , 2.0 × 10 -8 , and 4.5 × 10 -7 , respectively, which were smaller than 4.91 × 10 -6 (= 0.05/10,187), the significance level of 0.05 with a conservative Bonferroni correction for 10,187 SNPs used in the association analysis. Nevertheless, none of  Our analyses illustrated that the typical gene-mapping procedure to identify target regions through a genomewide linkage scan using markers at a density of 1 marker/ 10 cM prior to a fine-scale mapping on the targeted regions, could possibly fail to identify disease loci due to either limited major gene effects, misplacement of markers, or insufficient information content of microsatellite markers. The initial genome-wide scan turns out to be extremely critical to select regions harboring disease genes for further fine-mapping analysis in the typical process. The availability of SNP markers provides substantially greater information content than the microsatellites, thus linkage signals missed by microsatellites could be picked up by SNPs. However, the presence of LD among the SNPs, the inability to detect Mendelian errors, and the inability to accurately validate genetic maps have complicated linkage studies using SNPs. Association studies, on the other hand, have greater power to detect genes of modest effect [1] than linkage analysis. The results from association studies, however, would need to be interpreted with caution, since numerous tests would need to be carried out in a genome-wide association scan, which would increase the false-positive rate, and a correction to significance levels for multiple tests is necessary. Additionally, in our haplotype analysis, the association between a single-SNP haplotype and alcoholism could vary substantially by the window sizes of the multi-SNP haplotypes; and the linkage signals identified in this study might not be strong enough to further identify the causative haplotypes.

Authors' contributions
YFC made contributions to the study design, statistical analysis, interpretation, and draft of the manuscript. SYL participated in the design of the study and performed the data analysis. YYT conceived of the study and helped to draft the manuscript. All authors read and approved the final manuscript.