An empirical evaluation of imputation accuracy for association statistics reveals increased type-I error rates in genome-wide associations
© Almeida et al; licensee BioMed Central Ltd. 2011
Received: 22 October 2009
Accepted: 20 January 2011
Published: 20 January 2011
Genome wide association studies (GWAS) are becoming the approach of choice to identify genetic determinants of complex phenotypes and common diseases. The astonishing amount of generated data and the use of distinct genotyping platforms with variable genomic coverage are still analytical challenges. Imputation algorithms combine directly genotyped markers information with haplotypic structure for the population of interest for the inference of a badly genotyped or missing marker and are considered a near zero cost approach to allow the comparison and combination of data generated in different studies. Several reports stated that imputed markers have an overall acceptable accuracy but no published report has performed a pair wise comparison of imputed and empiric association statistics of a complete set of GWAS markers.
In this report we identified a total of 73 imputed markers that yielded a nominally statistically significant association at P < 10 -5 for type 2 Diabetes Mellitus and compared them with results obtained based on empirical allelic frequencies. Interestingly, despite their overall high correlation, association statistics based on imputed frequencies were discordant in 35 of the 73 (47%) associated markers, considerably inflating the type I error rate of imputed markers. We comprehensively tested several quality thresholds, the haplotypic structure underlying imputed markers and the use of flanking markers as predictors of inaccurate association statistics derived from imputed markers.
Our results suggest that association statistics from imputed markers showing specific MAF (Minor Allele Frequencies) range, located in weak linkage disequilibrium blocks or strongly deviating from local patterns of association are prone to have inflated false positive association signals. The present study highlights the potential of imputation procedures and proposes simple procedures for selecting the best imputed markers for follow-up genotyping studies.
Genome-wide association studies (GWAS) are a promising tool for the identification of genetic markers underlying phenotypes of interest and recently allowed the identification of markers associated with several human complex phenotypes. These studies have accomplished their goals in improving our knowledge of genetic patterns underlying diseases such as diabetes mellitus type I  and II  and Cronh's disease . Although methodologically appealing, these high-throughput experiments are not free from biases and limitations. Indeed, it is highly acknowledged that GWAS are not only prone to major drawbacks such as genotyping errors and sample failures, but also to varying levels of genome coverage across samples . In practice, a further complication arises from the barrier imposed by the comparison of results among different GWAS. The commercially available GWA platforms make use of distinct sets of markers with highly heterogeneous genomic coverage ranging from hundreds of thousands to millions of typed markers . This diversity in panels of markers limits even further the full potential of genome-wide association studies to uncover variants putatively implicated in the susceptibility to diseases or other complex phenotypes of interest. This heterogeneity transforms the comparison, as well as, the combination of data results generated from distinct genome-wide panels into a challenging endeavor .
To overcome these issues, genotyping imputation algorithms were developed. These methods use information provided by high quality markers combined with genome structure information for the population of interest organized in the HapMap database. These procedures can potentially be a nearly zero-cost alternative to increase both power and coverage in individual GWA studies. The imputation procedures allow meta- and pooled analyses of GWAS data generated by distinct genotyping platforms, maximizing their overlap and, consequently, the number of typed individuals. Despite promising, the success of imputation algorithms are relative since they could also amplify non-detected technical errors in genotyped markers, the available HapMap information may not be well consolidated for the population of interest or the applied imputation algorithm may not be well suited for a specific dataset .
Here, we present a comprehensive comparative analysis of the data generated by the multipoint imputation algorithm and the data obtained by direct genotyping in a type-II diabetes GWAS dataset. This imputation algorithm uses a Markov chain to infer the allelic frequencies of a marker by the information provided by a large set of flanking markers. The analyzed dataset was generated and organized by the Welcome Trust Case Control Consortium (WTCCC) and is a constituent of a large epidemiological study focused in the determination of genetic markers that could predispose an individual to seven different diseases of interest . In this scientific effort, a group of approximately 3000 healthy individuals was compared to groups composed by 2000 individuals accessed by diseases of interest such as: diabetes type-II, hypertension, coronary heart disease and bipolar disorder. These healthy individuals are part of two distinct cohorts selected to avoid population stratification, a very common source of bias in GWAS. Imputation algorithms currently available can use very distinct statistical approaches and, overall, their accuracy is satisfactory . Details on the most recent methods, as well as their advantages and limitations, are reviewed and critically discussed elsewhere . Our focus, in this report, is to describe how inferences based on imputed genotypes might impact the discovery of genetic markers possibly associated with complex phenotypes. The results presented here highlight the potential benefits and limitations of the use of imputed data in GWAS association studies for common phenotypes.
Characteristics of the examined datasets
The results discussed herein are based on data available for approximately 2000 individuals accessed by type-II diabetes and 3000 healthy individuals (controls). We limited our evaluation to 387,662 biallelic markers with full information on both observed (genotyped) and imputed genotype frequencies. This set of markers covers all chromosomes, but sex-linked markers, having no major over-representation in any specific chromosome. A detailed list of the examined markers and their respective imputed and empiric frequencies is available upon request. In this report, the term empiric denotes markers whose allelic frequencies were determined by direct genotyping.
SNP selection quality criteria
Imputed versus empirically genotyped markers: inflation of type-I error rates
The use of different genomic significance thresholds and the number of SNP accepted as significant in empiric and imputed frequencies.
Comparison of the number of markers that would be considered associated by empiric and imputes allelic frequencies
Number of markers
p.value < 10-5
p.value < 10-5
p.value > 10-5
Top associated markers based on imputed allelic frequencies
The same analytic procedure was carried in polymorphic markers presented in the WTCCC hypertension dataset. Initially, we determined the set of markers from whom allelic frequencies were both directly genotyped and imputed using the haplotypic structure of flanking markers. The association statistics were generated and compared, similarly to the diabetes database approach, and similar results were also observed (Additional file 3, Table S2). Another important point is that, despite a considerably high correlation coefficient between association statistics, several hugely biased imputed markers could mislead follow up analyses. This finding appears rather contradictory but one should keep in mind that the correlation of minus log transformed association statistics is mainly defined by the immense number of markers showing good agreement between measures .
Characteristics of the false-positive signals
Next, we sought to examine characteristics of false-positive associations that could be used as predictors of the quality of association signal derived from imputated markers. An analysis of the characteristics of false-positive signals is of paramount importance to guide investigators in appropriately evaluating discovered signals based on imputed markers. Here, discovery entails results crossing a specific α threshold under a frequentist perspective rather than a Bayesian approach. We selected an α = 10-7 for illustrative purposes, an approximation that should work relatively well in typical studies conducted currently in Caucasian populations (CEU HapMap population, for example). Our empirical analysis demonstrated that the magnitude of the odds ratio of false-positive associations lies in the range of effects typically found in the GWA setting: median 1.26 (min = 1.20, max = 1.61); odds ratios were coined to be ≥1 for consistency. However, false-signals from imputed genotypes suggest more commonly protective effects (n = 47) rather than susceptibility effects (n = 26) for the minor allele variant.
Key indicators of a poor imputation performance on association statistics
Next, we carried out exploratory procedures to investigate key indicators of a poor imputation performance on association statistics. Specifically, we tested how the use of different quality calling criteria and the minor allele frequency (MAF) thresholds could predict the observed bias between empiric and imputed frequencies. This feature was explored by using more stringent cutoffs for calling rates, Hardy-Weinberg disequilibrium (HWD) and the use of SNPs showing a MAF ≥1%. The number of markers excluded by these quality filters was determined. The minus log transformed association statistics of the remaining imputed or genotyped markers were compared by analyzing their degree of correlation (Additional file 4, Table S3 and Additional file 5, Table S4). Consistent with findings from recent investigations studying the accuracy of imputation algorithms over genotype determination, the use of polymorphisms with a MAF below 1% accompanied or not by lower calling rates decreases the overall agreement between results based on imputed genotypes and those obtained by truly genotyped markers. In our examined dataset, HWD had no major impact on the performance of imputation since markers with strong Hardy-Weinberg deviation were already trimmed from the dataset before publication.
Sliding window of association statistics
Finally, we explored the hypothesis that if a particular marker is truly associated with the investigated phenotype one would expect that close markers (in LD with the tested marker) would also present a higher chance of also being associated. In this scenario, a considerable proportion of markers flanking an associated SNP should also present significant levels of association for the phenotype under investigation. Imputed SNPs located in the same chromosomal region are inferred with similar accuracy since the same haplotypic structure information was used by the imputation algorithm. In the same hand, it is expected that a totally isolated associated marker within a well-known LD block will likely represent a false positive association. To evaluate the validity of this hypothesis, we developed an algorithm implementing a sliding window procedure that determines and collects minus log corrected association statistics of consecutive imputed markers using three different window sizes (1, 2 and 3 flanking markers) (see methods for further discussion). We determined different sliding windows size centered in the 73 imputed markers considered associated for diabetes II, these sliding windows were separated in two groups of sliding windows based if their central marker was concordant or discordant to empirically measured association statistics and three different summary statistics (mean, variance and total sum of corrected association statistics) were collected.
Genome wide association studies are a promising tool for the determination of genetic signatures that could, when associated with environmental factors, predispose an individual to a phenotype of interest. Quality control of data in a GWAS study has been implicated as an important source of bias and loss of power in both linkage analyses and population-based association studies . Imputation algorithms use allelic frequencies of typed markers and the haplotypic structure information to infer the expected allelic frequencies of a low quality or missing marker. These algorithms are considered a near zero cost alternative to allow the combination of results generated by different platforms with distinct genome coverage. The combination of directly genotyped and imputed allelic frequencies allowed the identification of SNPs that were strongly associated to diseases of interest such as hypertension and diabetes [1, 8]. Genome wide association studies, like any other large scale experiments, are prone to false negative associations due to the impressive amount of hypothesis tests being performed and a small percentage of low quality SNPs can cause important statistical problems (). These statistical limitations demand that any marker considered associated with a particular disease, specially imputed ones, should be directly genotyped using different genotyping platforms. This conservative procedure is now considered mandatory for the publication of such results. Nevertheless, even the follow-up of a small fraction of positive results from a GWAS involves significant costs.
The association of imputation procedures with low density chips can offer a convenient way to enhance the cost efficiency ratio and statistical power of a GWAS, since more individuals/markers can be typed by the same cost . Several reports have compared the overall accuracy and statistical power of different imputation methods and highlighted the high genotype prediction accuracy of existing methods especially in genomic regions showing high LD (Linkage Disequilibrium) between markers. Since imputation methods accuracy is closely related to the quality of the empiric frequencies used as an input, we initially determined the complete set of markers that were both directly genotyped and imputed by the multipoint imputation algorithm in WTCCC () diabetes II GWAS. This resulted in a set of 387,668 markers that were further evaluated. Using this set of markers, we tested a series of different quality criteria thresholds for MAF (minor allele frequencies), calling probabilities and Hardy Weinberg equilibrium deviation and analyzed the overall correlation between minus log transformed P-values of empiric and imputed allelic frequencies under a log-additive model of inheritance. We used a combined quality criteria of markers showing MAF > 0.01 and calling probability higher than 0,95 and filtered markers showing a considerably higher accuracy between association statistics using imputed and empiric frequencies (Figure 1). Using a minimum association threshold of 10 -5, we identified a total of 73 imputed markers clamming association and among those, only 38 (52%) would be considered associated based on their empiric (directly genotyped) association statistics (i.e., nearly a half of imputed markers would be erroneously considered associated to the phenotype under study). The same pattern was observed when different and more stringent significance thresholds were used (10 -6 and 10 -7).This result suggests that imputation methods are prone to inflate the number of markers considered associated in any of the evaluated thresholds in this report. These results are not a contradiction to the overall high accuracy for predicting genotype status previously described. These few highly deviated markers would be considered associated even when using highly stringent significance thresholds (< 10 -7 or lower) which could considerably jeopardize follow-up studies based only on association statistics of imputed markers. Since imputed markers are indispensable for merging the information generated by different platforms or studies (meta-analysis), it's important to identify these badly imputed and hugely biased markers.
In this report, we comprehensively tested several genotyping and imputation quality criteria, haplotypic information and chromosomal location as predictors of the quality of association statistics derived from imputed markers. As demonstrated in other reports dealing with the accuracy of genotypic determination, when the MAF of imputed markers are close to 50%  imputation accuracy greatly diminishes. We further analyzed a subset of markers that were selected based on their extreme minor allele frequencies (MAF > = 0, 49) to determine the validity of the evidence provided by this allelic condition for the identification of biased imputed markers. Indeed this allelic condition greatly predisposes imputed markers to have biased association statistics, but it can not be considered a good predictor since the majority of markers in this allelic condition show good agreement with directly genotyped ones in terms of association statistics. Imputed markers showing these specific allelic frequencies should be annotated and their use in follow-up studies should be done carefully. The other analyzed quality criteria, such as calling probabilities and Hardy-Weinberg Equilibrium deviation showed an even more limited use as predictors of false-positive associations derived from imputed allelic frequencies, since the bias between empiric and imputed association statistics was randomly distributed or clustered in markers showing high calling probabilities or very close to HW equilibrium.
A commonly accepted source of bias is the use of not well consolidated haplotypic information as an input for imputation algorithms. This could lead to imputed allelic frequencies not coherent to the population under study and, consequently strongly biased association tests. To explore this hypothesis we determined haplotypic blocks centered in each marker of the WTCCC dataset that were also present in the HapMap database. The comparison between the observed biases and four different summary statistics, representing haplotypic block consistency, showed a modest success when variance and maximum values were tested as predictors. Interestingly, the comparison between mean and median values of linkage disequilibrium as predictors showed that imputed markers located in regions showing weaker linkage disequilibrium structure are prone to higher bias. Their imputation and subsequent analysis under different genetic models of inheritance should be carefully done especially if the imputed marker is to be considered strongly associated to the phenotype under study. A similar result was suggested by Bakker P. I.W et al, when constructing a guide to the use of imputed information in meta-analysis of genome wide association studies.
The imputation algorithm overall accuracy for association statistics was compared and comprehensively evaluated under a diverse panel of different genetic conditions [13, 3]. Here, it was showed that when allelic frequencies were imputed in markers located in low LD (linkage disequilibrium) regions, the accuracy of association statistics strongly diminishes. This restriction is probably imposed by the limited haplotypic information in these regions and to a not well consolidated haplotypic map. Based on the well known strong dependence between available haplotypic information quality and the accurate imputation of markers located in a specific haplotypic block, we developed an algorithm implementing a sliding window procedure focused in the analysis of association statistics of flanking markers as predictors for imputation quality of derived association statistics. Since the same haplotypic information is used for imputation of nearby markers it is expected that an imputed marker considered associated should be flanked by markers showing at least moderate association to the phenotype under study. Interestingly, imputed markers showing high concordance to empiric ones (for the derived association statistic) presented significantly higher total sum of association statistics as compared to false-positive markers. Indeed, the same procedure was applied to the complete set of imputed markers considered associated (10 -5) in the WTCCC hypertension dataset with similar results (Additional file 8, Figure S4). The complex nature of WTCCC databases impose a barrier for the interpretation of results in this manuscript and could be considered a major source of the bias especially in imputed markers. This barrier originates from the fact that control and cases were not ideally matched in terms of their ancestry and it is expected that some association statistics derived from directly genotyped markers and especially from imputed markers are, indeed, susceptible to an increased odds of both type-I and type-II errors. Nevertheless, our results are concordant with the idea that additional information can be gathered from nearby markers in order to prioritize potentially associated markers for follow-up studies.
Imputation algorithms are a convenient and low cost solution to increase the coverage and power of a performed GWAS, allowing comparison of already generated results and bridging the gap of distinct sets of markers in different GWAS platforms. Despite their, already evaluated, overall high accuracy for genotypic prediction, we describe that even after traditional filtering criteria, a considerable amount of markers may still present important problems when one is to evaluate the association statistics derived from these markers. We serially tested a group of features known as predictors for a low accurate genotype imputation. Mostly, these features were not able to robustly identify those markers from whom association statistics are significantly biased. One solution that seems to be robust is the use of information provided by flanking markers with the use of our sliding window procedure. It is expected that concordant imputed markers, showing agreement with association statistics derived from directly genotyped allelic frequencies, are located in haplotypic blocks composed by other markers showing, at least, a moderate association with the phenotype under study. Our results highlight the immense potential of imputation procedures, but are a reminder that indiscriminate use of imputed markers could alter the cost-effectiveness balance of follow-up genotyping efforts.
Determination of association statistics of a marker
The WTCCC consortium provided a complete panel of imputed and directly genotyped allelic frequencies of individuals accessed by diseases of interest and control individuals. The hypertension and diabetes datasets were downloaded and organized locally http://www.wtccc.org.uk/. Initially, we determined the complete set of markers that were both genotyped and imputed in each dataset. A Perl script was developed to generate a meta-population for each marker respecting the observed and imputed allelic frequencies for selected cases and controls. This script can be obtained by author request. Data were exported and analyzed by the specialized R package SnpAssoc , which determined for each SNP its association statistic under dominant, recessive, codominant and log-addictive models of genetic inheritance. This procedure was conducted, independently, by two co-authors (MAAA and TVP) and results were concordant. The complete set of association statistics was collected and organized locally and is available upon request.
Analysis of specific chromosomal regions within studied markers
Markers typed in a specific chromosome were selected and sorted by their chromosomal position. The association statistics of a marker under a log-additive model of inheritance were collected and using a minus-log transformation plotted (Y-axis) with their position in the sorted vector. Markers in different specific chromosomes were plotted in grey and black, respectively. The bias was determined as the algebraic difference between minus log transformed association statistics derived from direct genotyping and imputation ((-log10 (P-value- empiric) - (log10 (P-value - imputed)). All the analyses were conducted on the R statistical environment; the complete set of developed programs can be obtained upon request.
Determination of haplotypic blocks
HapMap haplotypic information was downloaded and organized locally (HapMap Public Release #22, 2007). A haplotypic block was defined as the complete set of r2 marker-marker measures associated to a specific marker independently of the use of a pre defined minimum threshold for the r2 measure. Once defined, these sets are informative for the determination of specific chromosomal regions under strong linkage disequilibrium. Each haplotypic block was characterized by their summary statistics and further explored for the identification of local patterns of strong association and the possible correlation between weak linkage disequilibrium regions and the accuracy of imputation derived association statistics.
Sliding window algorithm
The complete set of minus log transformed P-values of imputed and directly genotyped markers under a log-addictive model of inheritance was collected and ordered based on chromosomal position. Imputed markers that were considered associated using a pre-defined threshold (10 -5) were determined and classified into concordant and discordant markers in terms of their agreement within the association statistics. A locally developed Perl algorithm constructed sliding windows of different sizes (in this report 1,2 and 3) centered in the imputed markers (concordant or not) and collected the set of minus-log transformed association statistics of the flanking markers. A set of summary statistics, such as mean, median and variance of each sliding window was collected and referenced to the central marker. The complete set of raw results, summary statistics and markers comprised in each window can be obtained upon request.
The financial support was provided by FAPESP (Fundação de Amparo a Pesquisa do Estado de São Paulo) and CAPES (Comissão de Aperfeiçoamento de Pessoal de Nível Superior).
This study makes use of data generated by the Wellcome Trust Case-Control Consortium. A full list of the investigators who contributed to the generation of the data is available from http://www.wtccc.org.uk. Funding for the project was provided by the Wellcome Trust under award 076113" and cite the relevant primary WTCCC publication (details of which can be found on the WTCCC website).
- Newton-Cheh C: Genome-wide association study identifies eight loci associated with blood pressure. Nat Genet. 2009Google Scholar
- Wolfs MG: Type 2 Diabetes Mellitus: New Genetic Insights will Lead to New Therapeutics. Curr Genomics. 2009, 10 (2): 110-8. 10.2174/138920209787847023.PubMed CentralView ArticlePubMedGoogle Scholar
- Nothnagel M: A comprehensive evaluation of SNP genotype imputation. Hum Genet. 2009, 125 (2): 163-71. 10.1007/s00439-008-0606-5.View ArticlePubMedGoogle Scholar
- Pei YF: Analyses and comparison of accuracy of different genotype imputation methods. PLoS One. 2008, 3 (10): e3551-10.1371/journal.pone.0003551.PubMed CentralView ArticlePubMedGoogle Scholar
- Barrett JC, Cardon LR: Evaluating coverage of genome-wide association studies. Nat Genet. 2006, 38 (6): 659-62. 10.1038/ng1801.View ArticlePubMedGoogle Scholar
- de Bakker PI: Practical aspects of imputation-driven meta-analysis of genome-wide association studies. Hum Mol Genet. 2008, 17 (R2): R122-8. 10.1093/hmg/ddn288.PubMed CentralView ArticlePubMedGoogle Scholar
- Servin B, Stephens M: Imputation-based analysis of association studies: candidate regions and quantitative traits. PLoS Genet. 2007, 3 (7): e114-10.1371/journal.pgen.0030114.PubMed CentralView ArticlePubMedGoogle Scholar
- Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007, 447 (7145): 661-78. 10.1038/nature05911.
- Yu Z, Schaid DJ: Methods to impute missing genotypes for population data. Hum Genet. 2007, 122 (5): 495-504. 10.1007/s00439-007-0427-y.View ArticlePubMedGoogle Scholar
- Zhao Z: Imputation of missing genotypes: an empirical evaluation of IMPUTE. BMC Genet. 2008, 9: 85-10.1186/1471-2156-9-85.PubMed CentralView ArticlePubMedGoogle Scholar
- Balding DJ: A tutorial on statistical methods for population association studies. Nat Rev Genet. 2006, 7 (10): 781-91. 10.1038/nrg1916.View ArticlePubMedGoogle Scholar
- Anderson CA: Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am J Hum Genet. 2008, 83 (1): 112-9. 10.1016/j.ajhg.2008.06.008.PubMed CentralView ArticlePubMedGoogle Scholar
- Marchini J: A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet. 2007, 39 (7): 906-13. 10.1038/ng2088.View ArticlePubMedGoogle Scholar
- Gonzalez JR: SNPassoc: an R package to perform whole genome association studies. Bioinformatics. 2007, 23 (5): 644-5. 10.1093/bioinformatics/btm025.View ArticlePubMedGoogle Scholar