Whole-genome variance components linkage analysis using single-nucleotide polymorphisms versus microsatellites on quantitative traits of derived phenotypes from factor analysis of electroencephalogram waves

Alcohol dependence is a serious public health problem. We studied data from families participating in the Collaborative Study on the Genetics of Alcoholism (COGA) and made available to participants in the Genetic Analysis Workshop 14 (GAW14) in order to search for genes predisposing to alcohol dependence. Using factor analysis, we identified four factors (F1, F2, F3, F4) related to the electroencephalogram traits. We conducted variance components linkage analysis with each of the factors. Our results using the Affymetrix single-nucleotide polymorphism dataset showed significant evidence for a novel linkage of F3 (factor comprised of the three midline channel EEG measures from the target case of the Visual Oddball experiment ttdt2, 3, 4) to chromosome 18 (LOD = 3.45). This finding was confirmed by analyses of the microsatellite data (LOD = 2.73) and Illumina SNP data (LOD = 3.30). We also demonstrated that, in a sample like the COGA data, a dense single-nucleotide polymorphism map provides better linkage signals than low-resolution microsatellite map with quantitative traits.


Background
Alcoholism is a complex disorder involving multiple genes likely interacting with one another and environmental factors. Quantitative endophenotypes, such as electroencephalogram (EEG) measurements, have been suggested as better indices of alcoholism susceptibility than the customary dichotomous affection status [1,2]. EGG data defined by different experimental designs were available to participants in Genetic Analysis Workshop 14 (GAW14). Since EEG phenotypes are correlated, it is likely that a smaller number of underlying dimensions contribute to the variance of these EEG phenotypes. Our aim was to identify the underlying factors for the EEG phenotypes and search for genes influencing the derived factors and increasing the risk of alcohol dependence.

Phenotypes and factor analysis
We conducted a principal components analysis using the 12 EEG measures (ttth1-ttth4, ttdt1-ttdt4, and ntth1-ntth4). EEG measures from the Visual Oddball experiment were represented as four letters followed by a number (ttth1-ttth4, ttdt1-ttdt4, and ntth1-ntth4). The four letters denote different experiment conditions: ttth_ contain extracted measures from the target case correspond to the 'late' time window, which is set at 300 to 700  (bounding the visual  P3 event), and the theta band power (3 to 7 Hz). ttdt_ contain extracted measures which the delta band power is 1 to 2.5 Hz with other conditions same as ttth_. The fields labeled ntth_ contain extracted measures from the nontarget case correspond to the 'early' time window, which is set at 100 to 300 ms following stimulus presentation, and the theta band power (3 to 7 Hz). The number following the four letters denotes the locations of the 4 electrode placements: 1 -FP1 (far frontal left side channel), 2 -FZ (frontal midline channel), 3 -CZ (central midline channel), 4 -PZ (parietal midline channel).
This was followed by a common factor analysis in order to identify the underlying dimensions measured by the EEG data. We examined each of the phenotypes for normality before including it in the analysis. In the common factor model, each new phenotype is expressed as a linear combination of the original variables. The relationship of factors to the EEG phenotypes is reflected by factor loadings. The contribution of each factor to the set of variables is evaluated by eigenvalues. Based upon the distribution of the eigenvalues and the composition of the factors, we retained four factors. This solution accounted for 88% of the total variance. We used an oblique rotation of the factor solution. Factor scores were obtained using PROC FACTOR implemented in SAS (SAS version 8; SAS, Cary, NC). We treated each of the four factor scores as a new derived quantitative trait.

Map construction
Quantitative data usually provide more statistical power than a binary affection status. However, using the quantitative traits alone may still not be powerful enough to identify disease susceptibility genes for complex traits. Kruglyak predicted that using single-nucleotide polymorphism (SNPs) with a heterozygosity of 0.50 and approximately two to three times the density of the current microsatellite marker sets would achieve a similar result in linkage analysis as a genome scan with microsatellite markers [3]. Recently John et al. conducted a wholegenome scan using SNPs [4]. Their results showed that SNPs provided significantly higher information content than microsatellites and allowed loci to be defined more precisely. We hypothesized that there would also be higher information content, and better linkage signals for SNPs compared with microsatellites for quantitative traits. We carried out a whole-genome screen using 143 families from the Collaborative Study on the Genetics of Alcoholism (COGA) with four empirically derived quantitative traits (factor scores based upon the EEG data). Reformatted clean genotype data were provided by the COGA study, including 11,120 SNPs generated by Affymetrix GeneChip Mapping 10 K Array, 4,720 SNPs generated by Illumina, and 328 microsatellite markers spaced at 10-cM intervals across the genome. Both microsatellite and SNP genetic map positions were interpolated based upon the deCode genetic framework map, calculated based on their physical positions. Physical positions of SNPs were obtained from the NCBI database (release 34.3). SNPs with multiple physical map positions were dropped from the genetic map. All initial linkage analysis was performed using this adjusted map.

Linkage disequilibrium (LD)
Because linkage analysis algorithms assume linkage equilibrium between all markers, strong LD between SNPs may exaggerate the significance level of linkage and thus generate false positive results [5]. So we kept only one tag SNP in each haplotype block (SNPs in strong LD). The pairwise LD statistics D' and r 2 were calculated for all SNPs by HAPLOVIEW (v3.0) [6]. Haplotype blocks were defined as regions over which a very small proportion (<5%) of comparisons among informative SNP pairs showed strong evidence of historical recombination [7].

Linkage analysis
We performed variance components analysis for each factor by using SOLAR (v2.13) [8]. In variance components analysis, the total variance of each trait was decomposed into several sources by the following equation: where Ω is the covariance matrix for a pedigree, Π is a matrix with elements π qij , which is the expected proportion of genes two individuals share as identical by descent (IBD) at specific chromosomal location, Φ is the kinship matrix, I is the identity matrix, σ 2 q is the variance component corresponding to the additive genetic effects from the major locus, σ 2 g is the variance component corresponding to the polygenic effects, and σ 2 e is the variance component corresponding to the environmental effects. The variance components analysis tested the null hypothesis that the additive genetic variance caused by the major quantitative trait locus (QTL) for a given trait equals zero (H 0 : σ 2 q = 0, or no linkage). The hypothesis testing was conducted by comparing the maximum likelihood of a restricted model in which σ 2 q was constrained to zero with a more general model in which σ 2 q was estimated, using the likelihood ratio test. Twice the difference of the natural logarithm likelihoods of the two models yields a test statistic that is asymptotically distributed as a 50/50 mixture of a χ 2 and a point mass of zero. The log 10 of the likelihood ratio between the two models yields a LOD score that is equivalent to the classical LOD score of linkage analysis [8]. The IBD matrix, multipoint IBD matrix, and heritability (h 2 ) for each factor were estimated using SOLAR.

Results
EEG measures and loadings on each of the four factors (F1, F2, F3, F4) obtained from factor analysis are shown in Table 1. Two alcoholism classifications were provided in the COGA data. ALDX1 was based on the DSM-III-R and the Feighner criteria. ALDX2 was defined by the DSM IV criteria. Table 2 shows the results of an analysis of variance (ANOVA) comparing the factor scores for affection status groups defined by ALDX1 and ALDX2. F3 (the three midline channel EEG measures from the target case of the Visual Oddball experiment ttdt2, 3, 4) was significant in both ALDX1 and ALDX2, indicating subjects with different affection status for alcohol dependence have different F3. Post-hoc comparisons using the Bonferroni method show that F3 was significantly higher in the unaffected with some symptoms group than in the affected group (p < 0.05). Similar patterns were seen in ttdt3 and ttdt4.
We examined the heritability of each of the quantitative traits. Heritability for F1 (34.5 ± 6.6), F2 (32.1 ± 5.9), F3 (30.7 ± 6.2), and F4 (30.8 ± 6.7) was all significant (p < 0.001). We found significant evidence of linkage for F3 to chromosome 18 (LOD = 3.45 at 58 cM) in the Affymetrix SNP dataset. We had similar findings in the microsatellite (LOD = 2.73 at 61 cM) and Illumina SNP dataset (LOD = 3.30 at 56 cM) (Figure 1). Linkage peaks (LOD > 1.0) for each of the four factors are presented in Table 3. All genome scan results for each factor in each genotype dataset are shown in Figure 2.

Discussion
In the present study, our work suggests that there are four factors underlying the EEG measures. Among the four factors, factor 3 (F3), representing the midline measures (EEG ttdt2, 3,4), was significantly different between affection status groups as defined by both ALDX1 and ALDX2.
We found a novel genetic locus with significant evidence of linkage to F3 (EEG ttdt2, 3, 4) on chromosome 18, Multipoint LOD scores on chromosome 18 for trait F3 Figure 1 Multipoint LOD scores on chromosome 18 for trait F3. Multipoint LOD scores on chromosome 18 for trait F3 respectively using Affymatrix SNPs (red), Illumina SNPs (green) and microsatellites (blue) datasets.  indicating this region (18q12.1-12.3) may harbor a gene that confers liability for alcohol dependence. A search of genome databases revealed a potential candidate gene SYT4 located in the genetic locus on 18q12.3 where we found significant linkage. Synaptotagmin-4, encoded by SYT4, may play an important role in the Ca 2+ -dependent release of neurotransmitters and neuropeptides from the presynaptic nerve terminal. SYT4 expression was only detected in the brain, and was highest in the hippocampus [9]. An animal model showed that Syt4 mutant mice displayed impaired social transmission of food preference and disrupted contextual fear conditioning [9]. Based on the evidence from our linkage study and the gene function revealed by other studies, SYT4 may be a determinant of alcohol dependence and is a candidate for further study.
By using the SNPs in the genome-wide linkage analysis we observed a higher LOD score than using the microsatellite markers. The peak of linkage was also sharper for the SNPs with a smaller confidence interval than for the microsatellite markers.

Conclusion
In this study, our results from both SNPs and microsatellites suggest that there is a strong linkage of F3, which mostly consists of ttdt2, ttdt3 and ttdt4, to chromosome 18. We demonstrated that, in a sample like the COGA data, a dense SNP map with a quantitative trait could provide better linkage signals than low-resolution microsatellite scan for linkage analysis, and would also help define the peak of linkage more precisely.

Authors' contributions
YY participated in the design of the study, performed the statistical analysis and drafted the manuscript. YM, QM, JF, and LAF participated in its design and coordination. MAW participated in the design of the study and helped to draft the manuscript. All authors read and approved the final manuscript. Genome-wide scans for each Factors in each datasets Figure 2 Genome-wide scans for each Factors in each datasets. Genome-wide scans for F1(a), F2 (b), F3 (c) and F4 (d) as quantitative traits, using Affymatrix SNPs (red), Illumina SNPs (green) and microsatellites (blue) datasets.