Copy number variation in African Americans
© McElroy et al. 2009
Received: 18 November 2008
Accepted: 24 March 2009
Published: 24 March 2009
Skip to main content
© McElroy et al. 2009
Received: 18 November 2008
Accepted: 24 March 2009
Published: 24 March 2009
Copy number variants (CNVs) have been identified in several studies to be associated with complex diseases. It is important, therefore, to understand the distribution of CNVs within and among populations. This study is the first report of a CNV map in African Americans.
Employing a SNP platform with greater than 500,000 SNPs, a first-generation CNV map of the African American genome was generated using DNA from 385 healthy African American individuals, and compared to a sample of 435 healthy White individuals. A total of 1362 CNVs were identified within African Americans, which included two CNV regions that were significantly different in frequency between African Americans and Whites (17q21 and 15q11). In addition, a duplication was identified in 74% of DNAs derived from cell lines that was not present in any of the whole blood derived DNAs.
The Affymetrix 500 K array provides reliable CNV mapping information. However, using cell lines as a source of DNA may introduce artifacts. The duplication identified in high frequency in Whites and low frequency in African Americans on chromosome 17q21 reflects haplotype specific frequency differences between ancestral groups. The generation of the CNV map will be a valuable tool for identifying disease associated CNVs in African Americans.
Duplications or deletions of genomic segments generate copy number variants (CNVs) that can range is size from one thousand to several million base pairs, and may affect one or more genes. More nucleotides appear to be affected by CNVs than by single nucleotide polymorphisms (SNPs) . Current annotated CNVs cover about 28.8% of the genome, and, to date, over 5600 non-overlapping human CNV loci have been identified http://projects.tcag.ca/variation; Database of Genomic Variants) . CNVs are a major source of human genetic diversity, and have been shown to influence rare genomic disorders  as well as complex traits and diseases .
In addressing the role of CNVs in disease, it is important to understand their distribution in the population at large . Several studies have attempted to characterize CNVs in the general population using data from the International HapMap Consortium [1, 6–8], and other reference groups [2, 5, 9–11], and have confirmed that CNVs are widespread throughout the genome but show a broad range in population frequencies. However, as of the preparation of this manuscript, no reported studies have surveyed CNVs in African Americans. The objectives of the current study are to use genome-wide SNP array data to generate a CNV map of the African American genome and to describe differences between African and European Americans.
DNAs of 435 healthy African Americans and 435 healthy individuals of European descent (hereafter referred to as Whites) were available for analysis. High molecular weight DNA was extracted from freshly isolated peripheral blood lymphocytes using a standard desalting procedure. Quality and quantity of each genomic DNA sample was evaluated by fluorometry (Molecular Devices Spectra Max). One hundred forty of the African American DNA samples were derived from lymphoblastoid cell lines, all of which were from females, and all other DNA was isolated from whole blood. Epstein Bar virus (EBV)-transformed lymphoblastoid lines were generated from freshly isolated peripheral blood lymphocytes. Cells were washed and resuspended in complete Iscoves modified Dulbeccos culture media supplemented with 10% v/v fetal bovine serum, antibiotics, and virus. The ATCC B95-8 EBV-infected marmoset cell line was used as the source for virus stocks. The UCSF institutional review board approved this study and all participants gave written informed consent.
African American individuals were recruited from 28 US States, the mean age at sample acquisition was 45 years, and the population displayed a wide range of admixture . African American ancestry was self reported, but European ancestry was documented in the majority of individuals based on genotyping of 186 SNPs highly informative for African versus European ancestry as previously described . Global estimation of European ancestry using these markers indicated 23 ± 15% European ancestry . White individuals originated from 8 different regions: Australia (n = 11), East Europe (n = 22), North Africa (n = 1), North America (n = 29), North Europe (n = 93), South America (n = 1), South Europe (n = 71), and West Europe (n = 207). Females constituted 64% and 51% of the African American and White populations, respectively. All individuals were assayed on the Affymetrix GeneChip® Human Mapping 500 K Array Set. Quality control filtering and SNP frequencies are reported elsewhere .
Fifty randomly chosen African American females with DNA derived from whole blood were used as references for calculating the normalized total intensity measures for each SNP (log-R ratios) for all of the remaining individuals. The reference individuals were excluded from further analysis, resulting in 385 African American and 435 White test individuals. Using only female references allows the estimation of X chromosome CNVs in female test individuals. Raw copy number files (".cnchp" files) were generated using the CNAT4.0.1 algorithm in the Affymetrix® Genotyping Consol™ 2.1 with default settings. The ".cnchp" files from both the African American and White individuals were read into the Nexus 3.0 copy number analysis program (BioDiscovery, Inc.) and copy number variable regions were called using BioDiscovery's rank segmentation algorithm  with default settings for the Affymetrix 500 K assay which requires at least one probe per segment. CNV frequencies and between group frequency differences were estimated using Nexus. Fisher's Exact test was used to determine the significance of the frequency differences and False Discovery Rate (FDR)  was used to correct for multiple comparisons.
CNVs of interest were validated using region-specific TaqMan assays. An internal positive control gene (β-globin, HBB) was included in each assay to determine copy number and to confirm that the reaction amplified successfully [see Additional file 1]. Threshold cycle (Ct) values were generated from a pre-established threshold and ΔCt values were estimated from the difference of the control gene and the CNV test region. The ΔCt values were then treated as a quantitative trait and standard analysis of variance was utilized to test the association of the SNP-determined CNV status with the ΔCt for that region.
DNAs from 385 healthy African American and 435 healthy White individuals were scanned using the Affymetrix GeneChip® Human Mapping 500 K Array Set to identify CNVs. A single African American individual's DNA was plated twice, and is used as a comparison for consistency for CNV calls using the Affymetrix 500 K platform. Based on the log-R ratios, evidence for four identical CNVs was present in both samples, although a single deletion on chromosome 21 identified in one sample was just below the call threshold in the other sample [see Additional file 2]. The consistency of the results indicates the reproducibility of the experiment, albeit only in a single sample.
Based on the distribution of the number of CNV calls per individual [see Additional file 3], 28 individuals were identified as outliers (due to high numbers of CNV calls) and removed from the analysis to reduce the probability of CNV calls that were a result of assay performance rather than the presence of true CNVs. In addition, all CNVs on the X chromosome identified in males were removed, since all males have deletions of a single copy of the X chromosome when compared to female references.
Autosomal CNVs were contrasted between African American males and females to establish a conservative threshold for the largest CNV frequency differences expected under the null hypothesis, since true autosomal differences between males and females are not expected. The largest frequency difference for any autosomal CNV between African American males and females was 6.6%. Performing the same experiment in Whites yielded a largest autosomal CNV frequency difference between males and females of 5%. None of the CNV regions in either group with a frequency difference of 5% or greater between males and females harbored genes that were obvious candidates for sexual dimorphism. Since the largest frequency difference observed between males and females was 6.6%, a conservative threshold of 10% will be used in combination with the Fisher's Exact test FDR corrected p-values to declare true differences for further comparisons.
While all of the DNA samples for the White individuals were isolated from whole blood, 140 of the African American DNAs were isolated from lymphoblastoid cell lines. DNA derived from cell lines may have CNVs that result from the establishment of the lines . Any high frequency CNVs in the African American group that arose from the process of creating cell lines need to be identified and removed from the comparison between African and Whites. Considering only African American subjects, three regions showed a significant difference greater than 10%: chromosome 14 (21,811,993 – 21,836,082) (duplication in 74% of cell line DNAs; FDRp < 0.001), chromosome 14 (105,619,582 – 106,173,672) (deletion in 10.7% of cell line DNAs; FDRp < 0.001), and chromosome 17 (41,592,674 – 41,597,102) (duplication in 11.03% of cell line DNAs; FDRp < 0.002). CNVs in cell lines in these regions will not be considered in further comparisons between African American and White CNVs.
Two regions were markedly different between African Americans and Whites, excluding cell line regions (Figure 1B and 1C). A duplicated region was identified on chromosome 17 (41,600,030 – 41,932,225) that had a frequency of 45.1% in Whites and 8.03% in African Americans (FDRp < 0.001). Two genes are annotated in this region: leucine rich repeat containing 37A (LRRC37A) and ADP-ribosylation factor-like 17 (ARL17). Another duplicated region was identified on chromosome 15 (19,212,556 – 19,400,776) with a frequency of 21.24% in African Americans and 40.69% in Whites (FDRp < 0.001). The gene ANKRD26-like family B, member 1 (A26B1) is in this region. None of the aforementioned genes appear to have a readily identifiable biological association with ethnic differences. All other CNV features had a difference of <10% between African Americans and Whites.
Extreme copy events (homozygous deletions and >1 copy gains) were also analyzed independently from the previous analysis for differences between the two populations. In total, 75 extreme copy events were identified in African Americans (70 gains and 5 losses) and 176 extreme copy events were identified in Whites (171 gains and 5 losses). None of the frequencies of the extreme copy event regions were greater than 10% different between African Americans and Whites, but a single region was significantly different (p < 0.05) after FDR correction on chromosome 15 (18,427,103 – 19,643,166). This multiple copy gain in this region had a maximum frequency of 0.013 in African Americans and 0.086 in Whites. The two genes located in this region (coxsackie virus and adenovirus receptor pseudogene 2 [CXADRP2] and POTE ankyrin domain family member B [POTEB]) do not have an immediately apparent functional association with ethnicity.
In addition to the cell line associated CNV regions identified in the current study, copy number variations of chromosome 2 (88,876,198–89,912,849; 0.093 frequency in cell line derived DNAs and 0.024 frequency in whole blood derived DNAs), and deletions of chromosome 22 (20,905,109–21,439,970; 0.029 frequency in cell line derived DNAs and 0 frequency in whole blood derived DNAs) have previously been shown to be artifacts of transformation or somatic recombination of immunoglobulin genes ( and , respectively). Although these regions did not meet the criteria (FDR significant and >10% frequency difference) to be identified as associated with the generation of cell lines in the current study, they will be excluded from the data submitted to the Database of Genomic Variants, as will three regions labeled as copy number variant based on the data from a single SNP (because of sparse SNP spacing in these regions). All other CNVs identified the current study have been submitted to the Database of Genomic Variants .
In the current study, a CNV map was generated using DNA from a population of 385 African Americans using 50 randomly chosen female African Americans as a reference. A total of 1362 CNV events were identified in the population. In addition, CNVs were identified in a population of 435 White individuals using the same 50 African American females as a reference. The same reference population was used so that the CNV distributions of the two populations would be directly comparable. Two regions of the genome exhibited large CNV frequency differences between the two populations, one on chromosome 15 and another on chromosome 17. No genes in these regions had obvious roles in ethnic differences.
A total of 140 of the African American DNAs were derived from cell lines. The process of creating the cell lines generated a duplication on chromosome 14 in 74% of the cell line-derived DNAs. Although this region is listed as copy number variant in the Database of Genomic Variants, none of the DNAs derived from whole blood was identified as having this duplication. Apparently, either transfection with the EBV virus or the growing out of the cells caused this duplication event. The EBV virus may have integrated into this site, disrupting the organization of the region and resulting in the duplication. However, Jeon and colleagues did not identify a CNV in this region resulting from EBV transformation of B-cells from Korean subjects, and the 1p36.33 copy number increase identified in cell lines by Jeon et. al was only found in a cell line from a single individual in the current study . Simon-Sanchez and colleagues also did not identify this CNV when comparing DNA from EBV transformed cell lines to blood derived DNAs in a cohort of North American Whites . Another possibility is that the integration of the EBV DNA into another site of the genome may facilitate duplication at this site. Finally, a gene in this region may facilitate the process of expansion or survival of the cell line, and therefore cells with this duplication may have been selected for in the culturing and growing process. However, there are no annotated genes in the region of the duplication. Currently, it is unknown if the duplication is an ethnic, experimental, or EBV strain specific phenomenon, and the determination of these specifics is under investigation.
A duplication on chromosome 17 (41,600,030 – 41,932,225) was identified in both African Americans and Whites in the current study. This duplication is in the same location as a segmental duplication flanking a mental retardation associated deletion identified in another study . Segmental duplications have been shown to be catalysts for chromosomal rearrangement . Two major haplotypes (H1 and H2) are present in this region of the human genome, and the ancestral haplotype (H2), which is more prone to duplications, is found mostly in people of European descent (see  for discussion of 17q21.31). Most Africans have the H1 haplotype, which may explain the large frequency difference of the duplication in this genomic region between African Americans and Whites. Since the present study found that the duplication was present in 45% of Whites and only 8% of African Americans, it will be of interest to assess if the severe neurological phenotype resulting from the deletion in the 17 region is more prevalent in Whites than in Africans or African Americans. CRHR1 (corticotrophin releasing hormone receptor 1) and MAPT (microtubule-associated protein tau) are two of the six genes within the region deleted as a result of the segmental duplication. These genes are both associated with many neurological disorders. Since it is close to the genes, it is important to determine whether the duplication has an effect on the expression of these genes, which could produce a neurological phenotype.
As of the preparation of this manuscript, there are no other reports of the production of a CNV map in African Americans. The creation of this map is an important first step in determining the presence CNV admixture in African Americans. Since many studies are now identifying CNVs as underlying causes in disease subsets, the African American CNV map will also be important for identifying cross-ethnic and ethnic-specific disease associated CNVs.
We are grateful to the individuals that participated in this study. We thank Robin Lincoln for specimens' management and Refujia Gomez for support with sample recruitment.
This work was funded by grants from the National Institute of Health (RO1 NS046297) and National Multiple Sclerosis Society (RG3060C8). JPM is a National Multiple Sclerosis Society post-doctoral fellow. We thank Karen King at GlaxoSmithKline for managing the POPRES Affymetrix data used in this study.