A genome-wide linkage and association study using COGA data

Background Genome-wide association will soon be available to use as an adjunct to traditional linkage analysis. We studied alcoholism in 119 families collected by the Collaborative Study on the Genetics of Alcoholism and made available in Genetic Analysis Workshop 14, using genome-wide linkage and association analyses. Methods Genome-wide linkage analysis was first performed using microsatellite markers and a region with the strongest linkage evidence was further analyzed using single-nucleotide polymorphisms (SNPs). Family based genome-wide association test was also conducted using the SNPs. Results Nonparametric linkage analysis revealed weak linkage evidence on chromosome 7, and association analysis identified SNP tsc0515272 on chromosome 3 as significantly associated with alcoholism. Conclusion Linkage analysis may require large sample sizes and high quality genotyping and marker maps to adequately improve power, while association analysis could hold more promise in efforts to identify variants responsible for complex traits.


Background
Alcoholism is a complex trait affected jointly by genetic components and environmental factors. Linkage and association are often used to search for the responsible genetic variants of a complex trait. It is believed that association analysis has more power than linkage analysis in the genetic dissection of complex traits such as alcoholism, providing that strong linkage disequilibrium is present between a testing marker and the disease locus [1]. Because of rapid technical improvements and decreasing experimental costs, genome-wide association analysis will soon become as routine as the traditional genomewide linkage analysis for researchers. To compare the two methods, we performed both genome-wide linkage and association analysis of the Collaborative Study on the Genetics of Alcoholism (COGA) data made available to Genetic Analysis Workshop 14 (GAW14) participants.

Methods
The COGA dataset included 1,294 White individuals in 119 families. These individuals were enrolled for a linkage and association study. We selected ALDX1 as the phenotype. ALDX1 has five categories: 0: no information; 1: pure unaffected; 2: never drank; 3: unaffected with some symptoms; 5: affected. Fourteen individuals are classified in group 2 (never drank). In our analysis, we then defined 5 as affected, 1 and 2 as unaffected, and the remaining as unknown. The analysis results of coding 2 as unknown were essentially the same as that of coding 2 as unaffected. Our data then consisted of 528 affected individuals, among them, 487 offspring. The data also included 315 microsatellite markers evenly spaced across the genome with average marker distance of about 10 cM. There are also 10,081 single-nucleotide polymorphisms (SNP) across genome genotyped using GeneChip Mapping 10 K Array marker set of Affymetrix Inc.

Statistical analysis
Both single-and multipoint genome-wide nonparametric linkage (NPL) analyses were performed and the S ALL statistic [2] was used to assess the linkage evidence, as recommended by Sengul et al. [3]. We used the microsatellite markers for this genome-wide linkage analysis, with the application of the computer program ALLEGRO, which calculated Kong and Cox's LOD scores [4]. We then performed linkage analysis using SNPs in the region with the strongest linkage evidence to explore whether dense SNP markers could further improve linkage evidence. Three families were split to reduce the computation intensity in the linkage analysis.
We next performed family-based association testing (FBAT) by applying the program FBAT using the SNP [5]. The method implemented in FBAT can test association as well as linkage while avoiding spurious associations caused by population stratification. Because FBAT divides a large pedigree into small nuclear families and multiple sibs in a family are used, we then computed the test statistic using the empirical variance, as described in Lake et al. [6], to protect against type I error.

Results
We first performed single-point NPL analysis [2] using S ALL statistic suggested by Sengul et al. [3]. The LOD scores were converted from NPL Z scores by the method of Kong and Cox [4]. Table 1 summarizes the markers with observed LOD scores ≥ 1.0. The strongest single-point LOD score occurred at marker D7S820 (LOD score 2.6, asymptotic p = 0.00027). We also observed five additional markers on chromosome 7 with LOD scores ≥ 1.0. The linkage information for a single marker was lower than multiple markers. We then conducted multipoint linkage analysis and the results were generally consistent with the single-point analyses ( Table 1). The largest multipoint LOD score was on marker D7S1870 (LOD score 1.77, asymptotic p = 0.002), 13 cM away from marker D7S820. Although the linkage information was improved in multipoint analysis, the observed LOD scores were sometimes lower than the single-point analyses. This is perhaps due to the fact that multipoint linkage analysis is sensitive to genotyping errors and map misspecification [7]. In contrast, single-point analysis is robust to genotyping errors and no marker map information is required, but it is less efficient and more subject to random noise [7]. This can be observed from further linkage analysis using SNP in the region between marker D7S1870 and D7S1817 on chromosome 7, where 188 SNP were genotyped in an interval of 40 cM. For example, we observed 7 SNPs with LOD scores ≥ 1.5 and the largest LOD score 4.07 occurred at SNP tsc0039708 (at 113.922 cM) in single-point analysis. Further analysis revealed that 64% of families did not have information for linkage analysis at the location of SNP tsc0039708, which could explain the large LOD score observed at this SNP [7]. The heterozygosity of this SNP is 0.185. Multipoint analysis resulted in the largest LOD score (2.12 at 101 cM) and was consistent with that using microsatellite markers. The average linkage information using 188 SNPs was increased to 95%. The number of SNPs could apparently be reduced. For example, by selecting the most informative SNP every 0.5 cM, we observed the largest LOD score 1.76 at 110 cM with essentially no loss of linkage information (92%).
We then performed genome-wide association using FBAT on the SNP data. The p-values of the test statistic on each SNP were calculated based on the empirical variance, as described in Lake et al. [6]. The procedure can protect against type I error due to FBAT dividing large pedigrees into small nuclear families and using multiple sibs within a family. There were total of 10,081 SNPs across the genome; 417 SNPs were not polymorphic and 423 SNP showed evidence of departure from Hardy-Weinberg equilibrium (p < 0.01). These SNPs were excluded from further analyses. We observed 670, 167, and 19 SNPs with p-value less than 0.05, 0.01, 0.001, significantly exceeding 457, 91, and 9 SNPs expected under the null hypothesis of no association or linkage, suggesting true association and linkage between SNP and alcoholism. Table 2 presents the 19 SNPs with nominal p-values less than 0.001. Interestingly, only two associated SNPs (tsc0668988 and tsc1177811 on chromosome 1) were close to the region where weak linkage evidence was observed. SNP tsc0515272 on chromosome 3 showed the most significant association and linkage evidence to alcoholism (nominal p-value = 0.000006) and was close to genomewide significance (Bonferroni corrected p-value = 0.055).
For the further comparison with the linkage result on chromosome 7, we also listed the 5 SNPs with nominal pvalues less than 0.01 in the association analysis. All the 5 SNPs were located at least 28 cM away from the linkage peak.

Discussion
We conducted genome-wide linkage and association analyses using microsatellite markers and SNPs on the data provided by GAW14. Both single-and multipoint NPL analyses showed suggested linkage evidence on chromo-

Var S
( ) some 7. We could not replicate the linkage evidence on chromosome 3 (LOD = 0.37 at 71 cM) that was reported by Foroud et al. [8]. However our linkage analysis failed to identify a genome-wide significant region linked to alcoholism when using microsatellite markers. Using dense SNP markers will improve linkage information, and theoretically will improve the power to detect linkage. However, it may bring additional challenges compared with using microsatellite markers because high quality genotyping and SNP maps are required and much more computation power is needed. A recent study also suggested that the presence of linkage disequilibrium between tightly linked makers can inflate type I error because the current analysis methods assume linkage equilibrium [9]. Thus, further analysis tools allowing linkage disequilibrium between tightly linked markers need to be developed.
In contrast, association analysis may hold great promise in the genetic dissection of complex traits. In this study, we observed genome-wide significant evidence of SNP tsc0515272 associated with alcoholism after Bonferroni correction for multiple comparisons. Such a correction is usually conservative because of existence of linkage disequilibrium between SNPs located close to one another. Interestingly, we did not observed consistent results between linkage and association analyses. For example, we did not observed significant association evidence for SNPs under the linkage peak on chromosome 7. A possible reason is that the SNP genotypes in this study are still not able to capture all of the genetic variation in this region. Theoretical studies suggest that 250,000-800,000 SNPs are required for a genome-wide association study [10,11]. Some haplotype analysis may improve the current results. Linkage analysis did not reveal significant linkage evidence around SNP tsc0515272, where significant association was found, suggesting the lack of power of the linkage analysis. It should also be caution that type I error from both linkage and association analyses can also contribute the inconsistence of the two methods. We believe that the evidence identified in linkage or association analyses could be the important genetic finding and should be further studied.