Increased genetic diversity of ADME genes in African Americans compared with their putative ancestral source populations and implications for Pharmacogenomics
© Li et al.; licensee BioMed Central Ltd. 2014
Received: 24 October 2013
Accepted: 24 April 2014
Published: 1 May 2014
African Americans have been treated as a representative population for African ancestry for many purposes, including pharmacogenomic studies. However, the contribution of European ancestry is expected to result in considerable differences in the genetic architecture of African American individuals compared with an African genome. In particular, the genetic admixture influences the genomic diversity of drug metabolism-related genes, and may cause high heterogeneity of drug responses in admixed populations such as African Americans.
The genomic ancestry information of African-American (ASW) samples was obtained from data of the 1000 Genomes Project, and local ancestral components were also extracted for 32 core genes and 252 extended genes, which are associated with drug absorption, distribution, metabolism, and excretion (ADME) genes. As expected, the global genetic diversity pattern in ASW was determined by the contributions of its putative ancestral source populations, and the whole profiles of ADME genes in ASW are much closer to those in YRI than in CEU. However, we observed much higher diversity in some functionally important ADME genes in ASW than either CEU or YRI, which could be a result of either genetic drift or natural selection, and we identified some signatures of the latter. We analyzed the clinically relevant polymorphic alleles and haplotypes, and found that 28 functional mutations (including 3 missense, 3 splice, and 22 regulator sites) exhibited significantly higher differentiation between the three populations.
Analysis of the genetic diversity of ADME genes showed differentiation between admixed population and its ancestral source populations. In particular, the different genetic diversity between ASW and YRI indicated that the ethnic differences in pharmacogenomic studies are broadly existed despite that African ancestry is dominant in Africans Americans. This study should advance our understanding of the genetic basis of the drug response heterogeneity between populations, especially in the case of population admixture, and have significant implications for evaluating potential inter-population heterogeneity in drug treatment effects.
KeywordsADME genes African Americans Drug response heterogeneity Genetic diversity Genetic admixture
Many factors such as age, enzyme induction or inhibition, and diseases can affect enzyme activity. Variations in the DNA sequence of enzyme-encoding genes can abolish, reduce, or increase the activity of an enzyme. The genetic variations in the genes involved in the absorption, distribution, metabolism, and excretion (ADME) of drugs are therefore essential factors for the efficacy and safety of drugs in the human body [1, 2]. Generally, ADME enzymes are composed of Phase I metabolizing enzymes (such as the cytochrome P450 enzymes), Phase II metabolizing enzymes (such as arylamine N-acetyltransferase), and drug transporters (including the ATP binding cassette proteins) . Previous studies highlighted the contributions of both environmental and, in particular, genetic factors to variations in the activity of ADME proteins [4, 5]. Some functional polymorphisms have therefore been reported in ADME genes that allow the classification of individuals into intermediate, rapid, and slow metabolized groups, and the broad distribution of drug responses might increase the risk of drug therapy when the therapeutic window is narrow . The careful assessment of the contributions of ADME genetic variations to the efficacy and safety of drugs is an important task for the development of clinical pharmacogenetic studies.
Population studies have revealed that ethnic differences occur in the frequency of genetic variants [7, 8], and that significant genetic differences in the ADME genes between different populations could lead to therapeutic failure, or adverse drug responses. For example, the intronic SNP located at CYP3A5, known as “CYP3A5*3”, results in a nonfunctional protein, and occurs at a frequency of ~40% among African Americans, ~90% among Caucasians, and ~65% among Asians . Additional important ADME genes, such as CYP2C9, CYP2C19, CYP2D6, and NAT2, also have significantly different frequencies of genetic variants that may lead to different drug dose requirements of different ethnic groups [10, 11]. For example warfarin, an anticoagulant, has the highest dose requirements in African-Americans, the lowest dose requirements in Asians, and intermediate requirements in Caucasian populations . Since the populations in developing countries rely mainly on the US FDA or European Medical Agency guidelines for dosing instructions, a comprehensive understanding of the inter-ethnic differences in the ADME genes is therefore critical to guide more effective global drug prescriptions .
African Americans are well known admixed from Africans and Europeans . As the largest minority group in the United States, African Americans have received significant attention in pharmacogenetic studies. However, little is known about the influence of admixture on genetic heterozygosity and haplotype diversity, and how it may directly implicate the heterogeneity of drug responses. Furthermore, limited pharmacogenetic data are available on African populations , and so systematic comparisons of the patterns and magnitudes of diversity of ADME genes between African and African-American populations would benefit the drug responses of Africans, and facilitate future inter-ethnic investigations of drug metabolism.
We compared the genetic diversity of ADME genes (including 32 core genes and 252 extended genes) and that of non-ADME genes which were randomly selected from the list of known genes in the genome in African-American, African, and European populations. We then investigated the genetic architectures of ADME genes and searched for the factors that could influence the genetic diversity of ADME genes in the three populations. Further, we identified functional mutations with highly differential allele frequencies and compared the distributions of haplotypes clinically defined in each ADME core gene among the three populations. Finally, we explored the mechanism of higher genetic diversity of ADME genes in admixed population like the African American compared with that in its ancestral source populations.
Ancestral origins of ADME genes in African Americans
We further examined the local ancestry contributions from Europeans to African Americans in 32 ADME core genes and 252 ADME extended genes (Figure 1D and Additional file 4: Table S2). The average contribution of European ancestry is 24.1% (SD = 0.036), based on autosomal data, which is consistent with previous studies [14, 17]. However, the European genetic contributions varied from 15.8% (UGT2B7) to 33.4% (UGT1A1) in 32 ADME core genes, and from 15.3% (SULT1C2) to 34.6% (ABCG1) in the 252 ADME extended genes. In summary, none of the European ancestries of these 284 ADME genes significantly deviated from the average value of whole autosomes (<3 SD, 13.3%, and approximately 34.9% European ancestral component). These results did not support strong natural selection of the ADME genes in African-Americans since admixture .
The diversity patterns of ADME genes in African Americans
In pharmacogenetic studies, allele frequency, heterozygosity, and haplotype diversity have been commonly used as indicators of heterogeneity of the drug response. To further investigate the influence of admixture on ADME genes in African Americans, we examined the fluctuations in heterozygosity and haplotype diversity of each 10 kb bin spanning the 32 ADME core genes (Additional file 2: Figure S2). It is clear that heterozygosity and haplotype diversity have similar trends in the three populations. In most examples, the diversity pattern of ASW is closer to YRI, and both are significantly higher than CEU. Furthermore, heterozygosity and haplotype diversity vary much more than the local ancestral components, where the genetic diversity could significantly change in neighboring 10 kb bins, while similar ancestral fragments could span hundred thousand base pairs. It is likely that the genetic diversity patterns of an admixed population were affected not only by local ancestral proportions, but also by the patterns of its ancestral source populations.
The admixture resulted in distinct genetic diversity patterns of the African American population compared with its ancestral source populations, especially in regions that were highly different between populations. For example, we extracted the highly differential loci with frequency difference larger than 0.37 between at least two populations (with an empirical P value of less than 0.05 over the whole genome), and presented the frequency distribution of those 806 loci in Figure 2B. The data clearly reveal that the alleles of ASW are largely in moderate frequencies. The heterozygosity and haplotype diversity of these highly differentiated regions should therefore be consistently higher for African Americans.
Comparison of genetic diversity patterns of ADME genes between African Americans and their ancestral source populations
To compare the overall genetic diversity of ADME genes between African American and its ancestral source populations, we calculated the derived allele frequencies of 284 ADME genes with the exons, introns, and up- and down-stream 10 kb regions. In addition, we separated the above regions into 10 kb bins to avoid bias due to the varying lengths of different genes, and then calculated the expected heterozygosity and haplotype diversity of these bins and examined their distributions.
Because the alleles with intermediate frequencies were enriched in the ADME core genes from ASW, African Americans exhibited the highest expected heterozygosity compared with the other two populations (Figure 3B). Overall, CEU showed the lowest median heterozygosity value of 0.126, YRI exhibited an intermediate median heterozygosity of 0.179, and ASW demonstrated the highest median heterozygosity (0.181). Hence although the heterozygosity distributions of YRI and ASW were much more similar to each other than to CEU, the curve of heterozygosity in ASW was shifted to higher values than that of YRI (p < 0.001), indicating increased genetic diversity due to genetic admixture.
Haplotype diversity analysis of 32 ADME core genes showed similar patterns to the comparison of heterozygosity (Figure 3C). Generally, the haplotype diversity distribution of CEU was lower than the other two populations, the distribution was flatter, and the median value was 0.790. Conversely, the haplotype diversity distributions of ASW and YRI were narrower, and shifted to higher values. When ASW and YRI were compared, ASW had higher haplotype diversity with a median value of 0.912, while the median value of YRI was 0.903 (p < 0.001).
When the genetic diversity of 252 extended ADME genes was analyzed (Figure 3D-3F), obvious differences were identified between CEU and the other populations. However, compared with the analysis of the 32 core ADME genes, CEU exhibited a pattern with less enrichment in the very low or high frequency bins (Figure 3D), but shifted to higher values of both heterozygosity (with a median 0.148) and haplotype diversity (with a median 0.792; Figure 3E and 3F). When the 252 extend ADME genes of ASW and YRI were compared, the difference in allele frequency was smaller (Figure 3D). ASW and YRI showed high overlapping heterozygosity and haplotype diversities, and were different only at peak regions of the distributions (Figure 3E and 3F). Specifically, the median heterozygosities were 0.178 and 0.181, while the median haplotype diversities were 0.919 and 0.921, for ASW and YRI, respectively. In the 252 ADME extended genes assessed, ASW therefore showed slightly lower genetic complexity than YRI (p < 0.001), in contrast to the results from 32 ADME core genes.
To better characterize genetic architecture of ADME genes, we further compared the genetic diversity patterns of 32 ADME core genes with those of 50 randomly selected genes, and genetic diversity patterns of 252 extended genes with those of 500 randomly selected genes, as well as those of the whole autosomal regions. With respect to derived allele frequency (DAF) spectrums (Additional file 5: Figure S3 A-C), all three populations exhibited an exponential distribution, with CEU showing the highest, ASW moderate, and YRI the lowest enrichment in the low DAF bin (0.0-0.1). With respect to the expected heterozygosity distributions (Additional file 5: Figure S3 D-F), CEU again exhibited the lowest heterozygosity in all the three datasets (two randomly selected and one whole autosomal), while ASW and YRI showed very similar distributions. Similarly, haplotype diversity of CEU was the lowest among the three populations, whereas the distributions of ASW and YRI were comparable, as shown in Additional file 5: Figure S3 G-I. In summary, CEU showed consistently the lowest genetic diversity in all the random data sets we examined, which was consistent with the patterns we observed in ADME genes. However, the genetic diversity of ASW was similar to or even lower than that of YRI in random data sets, which was contrast to the patterns observed in the 32 ADME core genes.
Characterizing genetic diversity patterns of ADME core genes
Significant LSBL regions within given populations indicate the natural selection signals. Because we were unable to conduct distinct detailed selective sweeps, we used two independent natural selection detection approaches; iHS (integrated Haplotype Score) and CLR (Composite Likelihood Ratio) tests, to validate the selection signals of those genes (Figure 4). In most of genes showing significant LSBL, natural selection signals from iHS and CLR tests were also identified in at least one population, but were not necessarily found in the exact population that exhibited significant LSBL signals. For example, 12 out of 15 genes showed natural selection signals in at least one population by either iHS or CLR. However, only 3 genes (CYP3A4, CYP3A5, and CYP1A2) showed consistent LSBL and iHS/CLR signals in CEU. It is noteworthy that LSBL is a cross-population test, whereas iHS/CLR methods are used for within-population analysis. The inconsistent results observed in Figure 4 therefore accurately explain how natural selection shaped the genetic differences between populations.
Interestingly, 7 of the 15 genes presented in Figure 4 were identified as underlying natural selection by iHS/CLR tests in ASW, which was a similar proportion to CEU (8 out of 15) and YRI (6 out of 15). However, each of the genes with signals in ASW consistently showed similar signals in the ancestral source populations, particularly YRI. For example, SLC22A6 had underlying selection based on the iHS signal in both ASW and CEU, whereas ABCG2, CYP2C19, and SLCO1B3 were identified based on iHS or CLR signals in both ASW and YRI. Finally, GSTT1, ABCB1, and DPYD had underlying selection based on iHS or CLR signals in all populations. The beneficial selective sweeps found in ASW may therefore be inherited from either of its ancestral populations.
Twenty-four of the 252 ADME extended genes exhibited strong LSBL signals in at least one population (Additional file 8: Figure S6). Of these 24 genes, only 13 played a role in natural selection based on iHS/CLR signals, which showed less selective sweeps in the ADME extended genes compared with the core genes. The ADME extended genes also showed much more comparable genetic diversity patterns than the neutral datasets (Figure 3 and Additional file 5: Figure S3), suggesting that genes are subject to less selective pressure compared with the more functionally important ADME core genes.
Highly differential functional SNPs in ADME genes across the three populations
Summary information of highly differential functional SNPs at ADME core genes
amitriptyline, atorvastatin, etc.
carbamazepine, phenobarbital, etc.
cytarabine, ethambutol, etc.
clopidogrel, warfarin, etc.
carbamazepine, cyclophosphamide, etc.
alfentanil, alprazolam, etc.
dexamethasone, paclitaxel, etc.
docetaxel, mycophenolate mofetil
Functional haplotype analysis of ADME genes between the three populations
Haplotype diversity analysis of the 32 ADME core genes
P value of 10000 times resampling
In the haplotype analysis of NAT2 (Figure 6B), although CEU had the lowest diversity, the haplotypes were distributed into three groups with similar proportions: GCCTGGG (38.8%), GCTCGAG (27.7%), and GTTCAAG (29.4%), which are also common haplotypes in ASW and YRI. For the 13 haplotypes formed by 7 SNPs (Additional file 10: Table S3), 12 haplotypes were found in YRI, while only 10 were identified in ASW. With the exception of the three common haplotypes mentioned above, all other haplotypes exist at low frequency (<10%) in ASW. It is therefore clear that the haplotype diversity of the NAT2 gene in ASW is lower than in YRI. Considering that we did not find any natural signals of the NAT2 gene in the three populations here, it is likely that the genetic diversity of NAT2 in African Americans was mainly influenced by admixture. It is therefore noteworthy that we could not apply the efficacy and safety standard of NAT2 substrates in African Americans directly to Africans, since Africans show higher genetic diversity in this region.
In this study, we investigated the genetic diversity of drug metabolism-related (ADME) genes in African-Americans (ASW) compared with Europeans (CEU) and Africans (YRI), which are the representative ancestral source populations of African-Americans according to a previous study . As expected, the genetic diversity of the admixed population, such as allele frequency, expected heterozygosity, and haplotype diversity, was largely determined by its ancestral source populations, demonstrating the large influence of admixture on the genetic profiles of African Americans, including drug related genes. In practice, due to few pharmacogenomics studies carried out on African populations, the results from African Americans, which have been more extensively studied, are expected to benefit Africans. However, it is noteworthy that there could be considerable differences of drug responses between African and African American populations. In addition, it was reported that the contribution of African ancestry to African Americans was mainly from west and west-central Africans (~73%) but also from other African populations (~7%) . Therefore, despite taking YRI as representation of Africans sources would not significantly bias the local ancestry inference [17, 26], the differences between African American and African populations could be more complicated than what we presented here. Therefore, we suggest it is necessary to make efforts conducting pharmacogenomics studies in African populations in the future.
To further investigate the influence of admixture on the genetic architecture and diversity patterns of African Americans, we performed general genetic diversity comparisons, and found that ASW had a higher genetic complexity than CEU or YRI in the functionally important ADME core genes. It is expected that the ADME genes in ASW populations would have higher genetic diversity than CEU because ancient Europeans were subjected to severe migrational blocks compared with Africans, based on the “out of Africa” theory , and thus exhibit lower diversity . Consequently, African Americans received more gene flow from Africans than from Europeans [26, 29]. Nevertheless, it was surprising that ASW showed much higher genetic diversity in ADME core genes than YRI, which is significantly different from the patterns observed in the randomly selected genes or whole autosomal regions.
From a comparison of the genetic diversity of ADME core genes across the three populations, ASW showed the highest complexity by the main influence of admixture and enriched selection signatures as complementary. Since these results are based on comparisons of general patterns, these conclusions may not be applied directly to certain cases. We therefore further investigated the genetic diversity of each ADME gene, with particular focus particularly on the highly differentiated SNPs. As with gene-based analysis, CEU showed the lowest genetic complexity in most examples, while ASW showed enriched mediate allele frequencies, higher heterozygosity, and more complex haplotype diversity compared with CEU or YRI in certain genes such as ABCG2, CYP1A2, and CYP3A4. However in some genes such as NAT2, ASW showed a lower genetic diversity than YRI.
Due to population admixture, the ASW showed the different allele frequencies from its ancestral source populations, especially, ASW has higher heterozygosity and haplotype diversity than CEU and YRI in some important functional variants or haplotypes of ADME genes (Tables 1 and 2). Differential allele frequencies of the functional variants among populations suggested the phenotypes of drug responses with which these variants are associated could be also different among those populations. Generally speaking, the higher heterozygosity and haplotype diversity indicate that the distribution of phenotypic drug responses is broader in that certain population. For instance, we identified two functional SNPs of CYP1A2 reported by clinical studies that showed significant differentiation of allele frequencies and heterozygous states among the three populations, while ASW exhibited significantly higher haplotype diversity in CYP1A2 gene than the other two populations. To our knowledge, so far there has been no systemic study investigating the phenotypic distributions of CYP1A2’s substrates in these three populations, but we thought our observations should benefit exploring the population differentiations of clinic consequences at the genetic level. However, it is noteworthy that the genetic variants are only one of the factors affecting drug responses and most of explicit consequences of genetic variants are not yet fully understood. Thus, the phenotypic consequences of population differentiations of ADME genes should be carefully validated in future studies. On the other hand, although the role of ethnicity in pharmacogenomics studies is still debatable, there are essential ethnic consequences of the different drug dose requirements among different populations . Given that African Americans exhibited higher genetic diversity due to admixture, individual genotyping/sequencing is necessary in the future pharmacogenomic studies of African Americans because higher heterogeneity of drug responses is also expected in admixed populations and any oversimplified ethnic medicine standards might be inappropriate.
In this study, we established the connection between genetic diversity and the effects of clinic drug efficacy and safety based on literatures and public database. Especially, the PharmGKB database provides an opportunity to study the functional consequence of highly differentiated SNPs between different populations using the clinical results manually collected from literature. On the other side, the significant advancement of next generation sequencing and the establishment of public databases such as the 1000 Genomes Project have allowed us to access to the full spectrum of ADME gene mutations among different populations. However, some mutations in PharmGKB are not present in the 1000 Genomes dataset, which may be due to either rare mutations that only exist in some certain patients, or the sequencing depth of the 1000 Genomes Project is not sufficient to detect them.
The genetic diversity patterns between ASW, CEU, and YRI identified in this study could not completely explain the heterogenic drug responses between different populations, but still have important clinical implications. In addition, high-throughput DNA sequencing technology provides additional information not available from traditional pharmacogenetic studies. For example, we discovered eight highly differential SNPs which were not identified in PharmGKB: one non-synonymous SNP, two splice sites, and five intronic SNPs (Table 1). These data may have important functional implications for pharmacogenomics studies.
Inter-ethnic genetic differences are shaped by both demographic history that affects genome-wide pattern, such as population subdivision and admixture, and evolutionary forces such as natural selection that affect local regions only. In this study, we identified considerable differences between African American and African populations in some functionally important ADME genes, indicating individuals from the two populations should be treated differently in pharmacogenomics. It is likely the genetic characteristics of ADME core genes in African Americans have been shaped by both genetic admixture and natural selection.
Genetic variation data
The investigations of genetic diversity in this study were based on 1000 Genomes project Phase I data . Given the low coverage of sequencing data (2-4x) and even lower coverage on sex chromosomes (1.74x), we focused on the autosomal SNP data in which most of ADME genes are located. We extracted the genetic variation data of African Americans (ASW), Europeans (CEU), and Africans (YRI) from the VCF files released by the 1000 Genomes Project, and the genetic variation data have been already phased with BEAGLEs . The sequencing error in the condition of low coverage could make some singletons unreliable  and our work focused on high frequency SNPs, therefore we filtered out the monomorphic sites and singletons in the 234-pooled individuals. Finally, we obtained a total of 18,389,222 SNPs from 61 ASWs, 85 CEUs, and 88 YRIs. Derived allele frequencies and positive selection tests (such as iHS and CLR tests) were only performed on SNPs with known ancestral information that were obtained from the 1000 Genomes Project. As a result, there were a total of 16,224,331 SNPs with known ancestral states, which is approximately 88.2% of the total SNPs obtained.
ADME genes and putative neutral datasets
As described previously , the ADME gene lists were obtained from the PharmaADME database (http://www.pharmaadme.org/), including the core and extended sets , as shown in Additional file 4: Table S2. After excluding the genes located on sex chromosomes, there are 32 core ADME genes that play the most important roles in drug metabolism, and 252 extended ADME genes that play a role in drug metabolism, but are not the major factors. Gene coordinate information was obtained from the RefSeq database , and 10 kb up- and downstream of each gene was included.
To compare the ADME genes between populations, we used two additional groups of genes/regions as control data. Firstly, to check whether the ADME genes exhibit the specific genetic diversity pattern compared with other coding regions, we created data of several sets of genes (including the 10 kb up- and downstream regions) that were randomly sampled from the RefSeq database without replacement (http://www.ncbi.nlm.nih.gov/RefSeq/). Given the different number of ADME core genes (n = 32) and extended genes (n = 252), we accordingly generated two datasets with comparable number of genes, i.e. 50 and 500 randomly selected genes, respectively. Secondly, data sets were also generated from 10 kb sliding windows in the autosomal regions to compare with ADME genes.
Functional annotations of SNPs and haplotypes
The functional effects of each SNP from each ADME gene were determined based on the variance effect prediction tools from the Ensembl database . The SNPs that affect gene expression were then studied based on the RegulomeDB dataset . In addition, we studied the SNPs and haplotypes with obvious clinical effects, which were collected and annotated from the PharmGKB database .
Inference of local ancestry
The local ancestry information of ASW was obtained from 1000 Genomes Project (ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/phase1/analysis_results/ancestry_deconvolution), which was based on the consistent results of four commonly used methods (LAMP-LD , HAPMIX , RFMIX , and MULTIMIX ), and was reported to have high accuracy for ASW (98.9-99.5%) . Briefly, these methods used different principles and algorithms to infer the locus specific ancestry of each individual in admixed populations. For instance, the HAPMIX incorporates background LD to calculate the likelihood of how the haplotypes of admixed individuals relate to those in the ancestral populations, and uses Hidden Markov Model to combine these likelihoods with information from neighboring loci, therefore it could infer an individual's local ancestry, their number of copies of each ancestry at each location in the genome.
From the local ancestry information of ASW (61 individuals in total) in the 1000 Genomes Project, there listed each track of diploid ancestry call for each individual, which including the code of diploid ancestry, the chromosome number, start and end position and length of tract in base pairs. Diploid ancestry calls are a consensus of calls that agree in >=3 of above methods, while the codes of diploid ancestry calls are: 0 is “unknown”, 1 is “European:European”, 2 is “European:African”, and 3 is “African:African”.
Analysis of genetic diversity
Frequency spectra were constructed by calculating the frequency of derived alleles at each polymorphic site of the genes or regions of interest in a given population. The distributions of heterozygosity and haplotype diversity were calculated in sliding windows of 10 kb, without overlapping across entire genes or regions. To avoid uncertainties in estimations, we excluded windows with less than 5 SNPs. Finally, a total of 168,026 windows were analyzed, among which 227 and 1,797 windows were from the ADME core gene and extended gene sets, respectively, while 381 and 4,092 were from 50 and 500 randomly selected genes, respectively.
Where n maj and n min are the number of the most and least observed alleles at each locus, respectively.
Where N is the total number of haplotypes, and xi is the frequency of each haplotype. For each ADME core gene, the significance of Hd between any two populations was assessed using 10,000 times bootstrap re-sampling .
Identification of highly differential loci between populations and the detection of natural selection signals in ADME genes
Where , , and are the pairwise F ST among ASW, CEU, and YRI, separately. Similarly, the mean LSBL values for the sliding windows of 10 kb were weighted over all loci in the window range. The top 1 percent of the empirical distribution of the average LSBL values of 10 kb windows spanning entire autosomal regions was therefore 0.061 for ASW, 0.367 for CEU, and 0.114 for YRI. The average LSBL value of a given window that is larger than the corresponding threshold was defined as a population-specific significant LSBL region.
The unstandardized iHS scores were calculated using the iHS program , and the standardized scores were obtained using Voight’s formula , in which the mean and standard deviation of the iHS score in different frequency bins were calculated from all the autosomes, and the frequency bin size was set as 0.01.
CLR (composite likelihood ratio) is a statistic to compute the likelihood ratio of selective sweeps by comparing the spatial distribution of allele frequencies in a given window to the frequency spectrum of null distribution, such as all the autosomal regions. In this study, the SweepFinder  program was used to carry out all calculations.
For both iHS and CLR tests, we calculated the standardized iHS or CLR scores of each population for the entire autosomal regions, and used the values with an empirical P value of 0.01 as the cutoff to detect natural selection signals at given ADME genes by these two approaches independently.
We thank LetPub for its linguistic assistance during the preparation of this manuscript and many of the group members for their helpful discussions. These studies were supported by the National Science Foundation of China (NSFC) grants 91331204, 31370505, 31171218 and 30971577, by the Knowledge Innovation Program of Shanghai Institutes for Biological Sciences, Chinese Academy of Sciences (CAS) (2013KIP108), and by the Science Foundation of the CAS (KSCX2-EW-Q-1-11). S.X. is Max-Planck Independent Research Group Leader and member of CAS Youth Innovation Promotion Association. S.X. also gratefully acknowledges the support of the National Program for Top-notch Young Innovative Talents and the support of K.C.Wong Education Foundation, Hong Kong. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Shastry BS: Pharmacogenetics and the concept of individualized medicine. Pharmacogenomics J. 2006, 6 (1): 16-21. 10.1038/sj.tpj.6500338.PubMedView ArticleGoogle Scholar
- Haga SB, LaPointe NM: The potential impact of pharmacogenetic testing on medication adherence. Pharmacogenomics J. 2013, 13 (6): 481-483. 10.1038/tpj.2013.33.PubMedPubMed CentralView ArticleGoogle Scholar
- Sim SC, Kacevska M, Ingelman-Sundberg M: Pharmacogenomics of drug-metabolizing enzymes: a recent update on clinical implications and endogenous effects. Pharmacogenomics J. 2013, 13 (1): 1-11. 10.1038/tpj.2012.45.PubMedView ArticleGoogle Scholar
- Guessous I, Gwinn M, Khoury MJ: Genome-wide association studies in pharmacogenomics: untapped potential for translation. Genome Med. 2009, 1 (4): 46-10.1186/gm46.PubMedPubMed CentralView ArticleGoogle Scholar
- Ma Q, Lu AY: Pharmacogenetics, pharmacogenomics, and individualized medicine. Pharmacol Rev. 2011, 63 (2): 437-459. 10.1124/pr.110.003533.PubMedView ArticleGoogle Scholar
- Eichelbaum M, Ingelman-Sundberg M, Evans WE: Pharmacogenomics and individualized drug therapy. Annu Rev Med. 2006, 57: 119-137. 10.1146/annurev.med.56.082103.104724.PubMedView ArticleGoogle Scholar
- Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, Feldman MW: Genetic structure of human populations. Science. 2002, 298 (5602): 2381-2385. 10.1126/science.1078311.PubMedView ArticleGoogle Scholar
- Rosenberg NA: A population-genetic perspective on the similarities and differences among worldwide human populations. Hum Biol. 2011, 83 (6): 659-684. 10.3378/027.083.0601.PubMedPubMed CentralView ArticleGoogle Scholar
- Hustert E, Haberl M, Burk O, Wolbold R, He YQ, Klein K, Nuessler AC, Neuhaus P, Klattig J, Eiselt R, Koch I, Zibat A, Brockmoller J, Halpert JR, Zanger UM, Wojnowski L: The genetic determinants of the CYP3A5 polymorphism. Pharmacogenetic. 2001, 11 (9): 773-779. 10.1097/00008571-200112000-00005.View ArticleGoogle Scholar
- Sistonen J, Fuselli S, Palo JU, Chauhan N, Padh H, Sajantila A: Pharmacogenetic variation at CYP2C9, CYP2C19, and CYP2D6 at global and microgeographic scales. Pharmacogenet Genom. 2009, 19 (2): 170-179. 10.1097/FPC.0b013e32831ebb30.View ArticleGoogle Scholar
- Sabbagh A, Langaney A, Darlu P, Gerard N, Krishnamoorthy R, Poloni ES: Worldwide distribution of NAT2 diversity: Implications for NAT2 evolutionary history. BMC Genet. 2008, 9-Google Scholar
- Dang MT, Hambleton J, Kayser SR: The influence of ethnicity on warfarin dosage requirement. Ann Pharmacother. 2005, 39 (6): 1008-1012. 10.1345/aph.1E566.PubMedView ArticleGoogle Scholar
- Phan VH, Moore MM, McLachlan AJ, Piquette-Miller M, Xu H, Clarke SJ: Ethnic differences in drug metabolism and toxicity from chemotherapy. Expert opin Drug metab Toxicol. 2009, 5 (3): 243-257. 10.1517/17425250902800153.PubMedView ArticleGoogle Scholar
- Tishkoff SA, Reed FA, Friedlaender FR, Ehret C, Ranciaro A, Froment A, Hirbo JB, Awomoyi AA, Bodo JM, Doumbo O, Ibrahim M, Juma AT, Kotze MJ, Lema G, Moore JH, Mortensen H, Nyambo TB, Omar SA, Powell K, Pretorius GS, Smith MW, Thera MA, Wambebe C, Weber JL, Williams SM: The genetic structure and history of Africans and African Americans. Science. 2009, 324 (5930): 1035-1044. 10.1126/science.1172257.PubMedPubMed CentralView ArticleGoogle Scholar
- Pillai G, Davies G, Denti P, Steimer JL, McIlleron H, Zvada S, Chigutsa E, Ngaimisi E, Mirza F, Tadmor B, Holford NH: Pharmacometrics: opportunity for reducing disease burden in the developing world: the case of Africa. CPT pharmacometrics Syst pharmacol. 2013, 2: e69-10.1038/psp.2013.45.PubMedPubMed CentralView ArticleGoogle Scholar
- Jin W, Xu S, Wang H, Yu Y, Shen Y, Wu B, Jin L: Genome-wide detection of natural selection in African Americans pre- and post-admixture. Genome Res. 2012, 22 (3): 519-527. 10.1101/gr.124784.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Zakharia F, Basu A, Absher D, Assimes TL, Go AS, Hlatky MA, Iribarren C, Knowles JW, Li J, Narasimhan B, Sidney S, Southwick A, Myers RM, Quertermous T, Risch N, Tang H: Characterizing the admixed African ancestry of African Americans. Genome Biol. 2009, 10 (12): R141-10.1186/gb-2009-10-12-r141.PubMedPubMed CentralView ArticleGoogle Scholar
- Cornuet JM, Luikart G: Description and power analysis of two tests for detecting recent population bottlenecks from allele frequency data. Genetics. 1996, 144 (4): 2001-2014.PubMedPubMed CentralGoogle Scholar
- Stumpf MP: Haplotype diversity and SNP frequency dependence in the description of genetic variation. Eur J Hum Genet. 2004, 12 (6): 469-477. 10.1038/sj.ejhg.5201179.PubMedView ArticleGoogle Scholar
- Vijayan NN, Mathew A, Balan S, Natarajan C, Nair CM, Allencherry PM, Banerjee M: Antipsychotic drug dosage and therapeutic response in schizophrenia is influenced by ABCB1 genotypes: a study from a south Indian perspective. Pharmacogenomics. 2012, 13 (10): 1119-1127. 10.2217/pgs.12.86.PubMedView ArticleGoogle Scholar
- Kato M, Fukuda T, Serretti A, Wakeno M, Okugawa G, Ikenaga Y, Hosoi Y, Takekita Y, Mandelli L, Azuma J, Kinoshita T: ABCB1 (MDR1) gene polymorphisms are associated with the clinical response to paroxetine in patients with major depressive disorder. Prog Neuro Biol Psychoph. 2008, 32 (2): 398-404. 10.1016/j.pnpbp.2007.09.003.View ArticleGoogle Scholar
- Leschziner GD, Andrew T, Pirmohamed M, Johnson MR: ABCB1 genotype and PGP expression, function and therapeutic drug response: a critical review and recommendations for future research. Pharmacogenomics J. 2007, 7 (3): 154-179. 10.1038/sj.tpj.6500413.PubMedView ArticleGoogle Scholar
- Soyama A, Saito Y, Hanioka N, Maekawa K, Komamura K, Kamakura S, Kitakaze M, Tomoike H, Ueno K, Goto Y, Kimura H, Katoh M, Sugai K, Saitoh O, Kawai M, Ohnuma T, Ohtsuki T, Suzuki C, Minami N, Kamatani N, Ozawa S, Sawada J: Single nucleotide polymorphisms and haplotypes of CYP1A2 in a Japanese population. Drug Metabolism Pharmacokinet. 2005, 20 (1): 24-33. 10.2133/dmpk.20.24.View ArticleGoogle Scholar
- Lin KM, Tsou HH, Tsai IJ, Hsiao MC, Hsiao CF, Liu CY, Shen WW, Tang HS, Fang CK, Wu CS, Lu SC, Kuo HW, Liu SC, Chan HW, Hsu YT, Tian JN, Liu YL: CYP1A2 genetic polymorphisms are associated with treatment response to the antidepressant paroxetine. Pharmacogenomics. 2010, 11 (11): 1535-1543. 10.2217/pgs.10.128.PubMedView ArticleGoogle Scholar
- Popat RA, Van Den Eeden SK, Tanner CM, Kamel F, Umbach DM, Marder K, Mayeux R, Ritz B, Ross GW, Petrovitch H, Topol B, McGuire V, Costello S, Manthripragada AD, Southwick A, Myers RM, Nelson LM: Coffee, ADORA2A, and CYP1A2: the caffeine connection in Parkinson's disease. Eur J Neurol. 2011, 18 (5): 756-765. 10.1111/j.1468-1331.2011.03353.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Price AL, Tandon A, Patterson N, Barnes KC, Rafaels N, Ruczinski I, Beaty TH, Mathias R, Reich D, Myers S: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genetics. 2009, 5 (6): e1000519-10.1371/journal.pgen.1000519.PubMedPubMed CentralView ArticleGoogle Scholar
- Ingman M, Kaessmann H, Paabo S, Gyllensten U: Mitochondrial genome variation and the origin of modern humans. Nature. 2000, 408 (6813): 708-713. 10.1038/35047064.PubMedView ArticleGoogle Scholar
- Serre D, Paabo SP: Evidence for gradients of human genetic diversity within and among continents. Genome Res. 2004, 14 (9): 1679-1685. 10.1101/gr.2529604.PubMedPubMed CentralView ArticleGoogle Scholar
- Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, Waliszewska A, Kessing BD, Malasky MJ, Scafe C, Le E, De Jager PL, Mignault AA, Yi Z, De The G, Essex M, Sankale JL, Moore JH, Poku K, Phair JP, Goedert JJ, Vlahov D, Williams SM, Tishkoff SA, Winkler CA, De La Vega FM, Woodage T, Sninsky JJ, Hafler DA, Altshuler D, Gilbert DA, et al: A high-density admixture map for disease gene discovery in african americans. Am J Hum Genet. 2004, 74 (5): 1001-1013. 10.1086/420856.PubMedPubMed CentralView ArticleGoogle Scholar
- Pena SD: The fallacy of racial pharmacogenomics. Braz J Biol Res. 2011, 44 (4): 268-275. 10.1590/S0100-879X2011007500031.View ArticleGoogle Scholar
- Auton A, Brooks LD, DePristo MA, Durbin RM, Handsaker RE, Kang HM, Marth GT, McVean GA, Genomes Project C, Abecasis GR: An integrated map of genetic variation from 1,092 human genomes. Nature. 2012, 491 (7422): 56-65. 10.1038/nature11632.PubMedView ArticleGoogle Scholar
- Browning BL, Browning SR: Rapid and accurate haplotype phasing and missing data inference for whole genome association studies using localized haplotype clustering. Genet Epidemiol. 2007, 31 (6): 606-606.Google Scholar
- Ke X, Taylor MS, Cardon LR: Singleton SNPs in the human genome and implications for genome-wide association studies. Eur J Hum Genet. 2008, 16 (4): 506-515. 10.1038/sj.ejhg.5201987.PubMedView ArticleGoogle Scholar
- Li J, Zhang L, Zhou H, Stoneking M, Tang K: Global patterns of genetic diversity and signals of natural selection for human ADME genes. Hum Mol Genet. 2011, 20 (3): 528-540. 10.1093/hmg/ddq498.PubMedView ArticleGoogle Scholar
- Daly TM, Dumaual CM, Miao X, Farmen MW, Njau RK, Fu DJ, Bauer NL, Close S, Watanabe N, Bruckner C, Hardenbol P, Hockett RD: Multiplex assay for comprehensive genotyping of genes involved in drug metabolism, excretion, and transport. Clin Chem. 2007, 53 (7): 1222-1230. 10.1373/clinchem.2007.086348.PubMedView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35 (Database issue): D61-D65.PubMedPubMed CentralView ArticleGoogle Scholar
- McLaren W, Pritchard B, Rios D, Chen Y, Flicek P, Cunningham F: Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor. Bioinformatics. 2010, 26 (16): 2069-2070. 10.1093/bioinformatics/btq330.PubMedPubMed CentralView ArticleGoogle Scholar
- Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA, Kasowski M, Karczewski KJ, Park J, Hitz BC, Weng S, Cherry JM, Snyder M: Annotation of functional variation in personal genomes using RegulomeDB. Genome Res. 2012, 22 (9): 1790-1797. 10.1101/gr.137323.112.PubMedPubMed CentralView ArticleGoogle Scholar
- Whirl-Carrillo M, McDonagh EM, Hebert JM, Gong L, Sangkuhl K, Thorn CF, Altman RB, Klein TE: Pharmacogenomics Knowledge for Personalized Medicine. Clin Pharmacol Ther. 2012, 92 (4): 414-417. 10.1038/clpt.2012.96.PubMedPubMed CentralView ArticleGoogle Scholar
- Sankararaman S, Sridhar S, Kimmel G, Halperin E: Estimating local ancestry in admixed populations. Am J Hum Genet. 2008, 82 (2): 290-303. 10.1016/j.ajhg.2007.09.022.PubMedPubMed CentralView ArticleGoogle Scholar
- Maples BK, Gravel S, Kenny EE, Bustamante CD: RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet. 2013, 93 (2): 278-288. 10.1016/j.ajhg.2013.06.020.PubMedPubMed CentralView ArticleGoogle Scholar
- Churchhouse C, Marchini J: Multiway admixture deconvolution using phased or unphased ancestral panels. Genet Epidemiol. 2013, 37 (1): 1-12. 10.1002/gepi.21692.PubMedView ArticleGoogle Scholar
- Thomas MG, Weale ME, Jones AL, Richards M, Smith A, Redhead N, Torroni A, Scozzari R, Gratrix F, Tarekegn A, Wilson JF, Capelli C, Bradman N, Goldstein DB: Founding mothers of Jewish communities: Geographically separated Jewish groups were independently founded by very few female ancestors. Am J Hum Genet. 2002, 70 (6): 1411-1420. 10.1086/340609.PubMedPubMed CentralView ArticleGoogle Scholar
- Birnbaum ZW, Tingey FH: One-Sided Confidence Contours for Probability Distribution Functions. Ann Math Stat. 1951, 22 (4): 592-596. 10.1214/aoms/1177729550.View ArticleGoogle Scholar
- Weir BS, Cockerham CC: Estimating F-Statistics for the Analysis of Population-Structure. Evolution. 1984, 38 (6): 1358-1370. 10.2307/2408641.View ArticleGoogle Scholar
- Shriver MD, Kennedy GC, Parra EJ, Lawson HA, Sonpar V, Huang J, Akey JM, Jones KW: The genomic distribution of population substructure in four populations using 8,525 autosomal SNPs. Hum Genomics. 2004, 1 (4): 274-286. 10.1186/1479-7364-1-4-274.PubMedPubMed CentralView ArticleGoogle Scholar
- Voight BF, Kudaravalli S, Wen XQ, Pritchard JK: A map of recent positive selection in the human genome. PloS Biol. 2006, 4 (4): 659-659.View ArticleGoogle Scholar
- Nielsen R, Williamson S, Kim Y, Hubisz MJ, Clark AG, Bustamante C: Genomic scans for selective sweeps using SNP data. Genome Res. 2005, 15 (11): 1566-1575. 10.1101/gr.4252305.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.