An updated meta-analysis approach for genetic linkage

We present a meta-analysis procedure for genome-wide linkage studies (MAGS). The MAGS procedure combines genome-wide linkage results across studies with possibly distinct marker maps. We applied the MAGS procedure to the simulated data from the Genetic Analysis Workshop 14 in order to investigate power to detect linkage to disease genes and power to detect linkage to disease modifier genes while controlling for type I error. We analyzed all 100 replicates of the four simulated studies for chromosomes 1 (disease gene), 2 (modifier gene), 3 (disease gene), 4 (no disease gene), 5 (disease gene), and 10 (modifier gene) with knowledge of the simulated disease gene locations. We found that the procedure correctly identified the disease loci on chromosomes 1, 3, and 5 and did not erroneously identify a linkage signal on chromosome 4. The MAGS procedure provided little to no evidence of linkage to the disease modifier genes on chromosomes 2 and 10.


Background
Kofendred Personality Disorder (KPD), as simulated for Genetic Analysis Workshop 14 (GAW14), is a psychiatric syndrome characterized by an overwhelming concern with the meaning of personal inner emotions while regarding the emotions of others. Like other complex personality disorders, KPD has numerous behavioral and biological characteristics. Additionally, KPD, like other complex diseases, is believed to be linked to many genes. The possibility of finding the majority of these genes from one independent study is small. Instead, pooling data across independent studies (i.e., a mega-analysis) or pooling linkage results across independent studies (i.e., a meta-analysis) may be the best means to identify these numerous genes with small effects.
In a mega-analysis, combining raw data from several studies allows the investigator to increase sample size. A megaanalysis can lead to an increase in power to detect linkage and reduce the level of type I error. Combining raw data would be an ideal approach, but data are not always readily available or freely shared. In a meta-analysis, the investigator can still combine information from several studies to obtain a consensus for linkage. The information typically found in the literature can range from published pvalues, LOD scores, or effect sizes.
Caveats to mega-and meta-analyses involve among-study heterogeneity, which can include differing marker maps, informativeness, sample sizes, sampling plans, and linkage tests. Methods have been proposed to handle such problems. The genome-scan meta-analysis (GSMA) method proposed by Wise et al. [1] accommodates differing marker maps in a meta-analysis, but this test is based on the level of significance (magnitude of LOD score or pvalue) at each marker. Combining results from significance tests can be limited [2][3][4] where the concordance or discordance of significant linkage between two studies may not reflect the existence of true linkage, but rather may be based on the amount of heterogeneity between the two studies. Combining effect sizes may be a better approach than combining results from significance tests, but there are still limitations if the studies have differing marker maps and use different tests to evaluate linkage. Etzel and Guerra [5] developed a method to evaluate evidence for linkage to a QTL from several linkage studies. However, this method has not been tested for a genomewide scan and it requires that all studies use the same type of linkage test (i.e., some version of the Haseman-Elston test). Loesgen et al. [6] developed a meta-analytic method that computes a weighted average estimate of score statistics where one proposed weighting scheme is a function of information content at a marker and sample size. Although this method was first proposed for studies using a common marker map, it can be extended to combine studies with differing marker maps.
In this paper, we present an updated meta-analysis method for assessing linkage to a quantitative trait locus (QTL) that generalizes the meta-analytic procedure first proposed by Etzel and Guerra [5] such that it does not assume that all studies use the same test for linkage and extends the weighting procedure proposed by Loesgen et al. [6] to incorporate differing marker maps. The result from the meta-analysis procedure of genome-wide linkage studies (MAGS) method is a genome-wide weighted average of evidence of linkage to a complex disease. Although this approach was developed to evaluate linkage to a QTL, we applied it to evaluate evidence of linkage to KPD (affection status) using the four simulated data sets provided for the GAW14 with knowledge of the disease gene locations.

The MAGS procedure
The MAGS method that we developed is based on procedures proposed by Loesgen et al. [6] and Etzel and Guerra [5]. For MAGS, it is not assumed that the studies use the same marker map or that they use the same test for linkage. However, it is assumed that the marker maps are available as well as the sample size, information content at each marker (preferred), and linkage summary statistics (LOD scores, nonparametric linkage (NPL) scores, or pvalues).
The MAGS method is based on a weighted average of transformed normal variates that are obtained through the reported linkage summary statistics. Suppose that we wish to complete a meta-analysis on k studies. Each study k has m k number of markers. It is not assumed that the studies have the same number of markers, m i ≠ m k , i ≠ j nor it is assumed that the studies have the same marker maps.

MAGS applied to GAW 14 simulated microsatellite datasets
The provided genome-wide microsatellite marker maps were identical for all four simulated data sets. We used GENEHUNTER2 [8,9]) to assess evidence for linkage to KPD for all 100 replicates of chromosomes 1, 2, 3, 4, 5, and 10 within all four studies. We then applied the MAGS method as described above to the linkage results, one chromosome at a time. Within each chromosome replicate the NPL scores obtained from GENEHUNTER2 were transformed to p-values using step 1b.

MAGS applied to GAW 14 simulated single-nucleotide polymorphism (SNP) datasets
The provided genome-wide SNP marker maps were also identical for all 4 simulated data sets. In order to fully evaluate the MAGS method using studies with differing marker maps and different analysis packages, we created unique genome-wide SNP marker maps for each study for chromosomes 1, 2, 3, and 10. The unique marker maps were created by simply randomly assigning the available markers on each chromosome to one of the four studies. We then used a different linkage test within each study to assess linkage to KPD one chromosome at a time: the Sibpal procedure from SAGE [10] for study AI, GENEHUNTER2 [8,9] for studies DA and KA, and the mlink procedure of LINKAGE [11] for study NY. We then applied the MAGS method as described above to the linkage results, one chromosome at a time, using a set of analysis points that spanned each chromosome with one analysis point positioned every 2 cM and D = 10 cM. We chose D = 10 cM because a polymorphism at a marker linked to an analysis point can provide information about the polymorphism at the putative QTL. If an analysis point had only one marker within a 10-cM radius, then no meta-analysis was conducted. For studies DA and KA, we transformed the NPL scores obtained from GENE-HUNTER to p-values using step 1b; for study NY, we transformed the MaxLOD scores obtained from LINKAGE using step 1a; for study AI, we transformed the t-value obtained from SAGE in a similar fashion to the NPL score in step 1b. The subsequent normal variates were weighted by w stq = (1 -2θ stq ) 2 n s , where n s is defined as the total number of individuals used in the linkage analysis of study s. Note that the weights in this application do not include information content because the measure was not available from all analysis packages (e.g., Sibpal). For study AI in which we used Sibpal, n 1 = 483 possible siblings were included in the analyses. For the studies in which we used GENEHUNTER2 (DA and KA), n 2 = 700 and n 3 = 694 individuals, respectively, were included in the analyses. For study NY, n 4 = 943 individuals were included in the analyses.
We calculated the frequency (across the 100 replicates) that the resulting MAGS values ( for the microsatellite analysis and for the SNP analysis) exceeded a set critical value at varying alpha levels (p = 0.01, 0.001, 7.4 × 10 -4 and 2.2 × 10 -5 ) [12] to evaluate power and type I error. Figure 1 presents the MAGS results for the microsatellite marker maps. The MAGS method identified the disease gene D1 (located approximately at 167 cM on chromosome 1) in more than 90% of the replicates even for very small alpha levels. For disease gene D2 (located approximately at 299 cM on chromosome 3), the MAGS method localized its location in all replicates. Likewise for disease gene D3 (located approximately at 5 cM on chromosome 5), the MAGS method detected its location in more than 90% of the replicates even at alpha levels as low as 2.2 × 10 -5 .

Results
For the disease modifier gene D6 (located approximately at 15 cM on chromosome 2) that affects the penetrance of phenotype 2, the MAGS method was not able to distinguish its signal from that of background genetic noise. Furthermore, D6 was not identified in any of our analyses of the individual studies. Likewise for the modifier gene D5 that is located on chromosome 10 (approximately 67 cM). The location of gene D5 was only identified in 10 of the 100 MAGS replicates when the alpha level was set at 1% and in none of the replicates when the alpha level was set at 2.2 × 10 -5 . Additionally, this gene was not identified in any of our single-study replicates. Meta-analysis of chromosome 4, which did not contain any disease genes, did not detect any erroneous linkage signals for alpha levels less than 1%.
In the analyses using the modified SNP marker maps (data not shown), D1 was identified in over 80% of the replicates while D2 was only identified in 20% of the replicates at an alpha level 2.2 × 10 -5 . Difficulty in identifying D2 was attributable to the sparseness of SNPs in the modified marker maps in the region at the very end of chromosome 3 surrounding D2. As with the microsatellite results, the modifier genes, D5 and D6, were not clearly identified.

Conclusion
Meta-analysis provides a means of combining information about linkage from smaller independent studies to identify genetic linkage to a complex trait while adjusting for among-study heterogeneity (different sample sizes, different marker maps, etc.). Because multiple genes are believed to be involved in a complex disease, many with modest effects, the probability of identifying them from single studies and replicating the results is low. This metaanalysis procedure correctly identified the three major genes we analyzed with high power even under quite restrictive conditions. Furthermore, this method did not erroneously identify linkage where no linkage was simu- lated when alpha levels of 0.1% or lower were used. In fact, based on the results for chromosome 4 (where no linkage was simulated) depicted in Figure 1d, the overall chromosome-wide alpha level of 0.1% resulted in pointwise alpha levels of 5% or less. Neither meta-analysis using the SNP or microsatellite data identified the modifier genes directly, but it might be possible to have identified them if the meta-analysis was performed using results from analyses performed conditional on the known genes.
The MAGS method performed better for the microsatellite marker maps than for the modified SNP marker maps. The microsatellite maps had higher marker density than the modified SNP maps with possibly different information content per marker. Also, when we analyzed the modified SNP maps, we did not use the same test for linkage in each study and we did not include information content in the MAGS calculation. The linkage tests that we used (GENEHUNTER2, Sibpal, mlink) vary in the type of pedigree structure and data that is used to test for linkage and hence vary in power to detect linkage which therefore affected the SNP meta-analysis. However, any meta-analytic procedure that is conducted on studies using varying linkage tests (with varying levels power to detect linkage) will be affected by these among-study differences. Metaanalysis provides a way to obtain consensus for linkage to a disease and is clearly an important step in the localization of genes involved in complex diseases.

Authors' contributions
CJE conceived of the application of this methodology as well as developed the meta-analytic method to the simulated data provided by GAW14. ML conducted the analyses of the simulated data with the assistance of CJE and TJC. All authors contributed to the writing and approval of the final manuscript.