A copy number variation in human NCF1 and its pseudogenes
© Brunson et al. 2010
Received: 7 August 2009
Accepted: 23 February 2010
Published: 23 February 2010
Skip to main content
© Brunson et al. 2010
Received: 7 August 2009
Accepted: 23 February 2010
Published: 23 February 2010
Neutrophil cytosolic factor-1 (NCF1) is a component of NADPH oxidase. The NCF1 gene colocalizes with two pseudogenes (NCF1B and NCF1C). These two pseudogenes have a GT deletion in exon 2, resulting in a frameshift and an early stop codon. Here, we report a copy number variation (CNV) of the NCF1 pseudogenes and their alternative spliced expressions.
We examined three normal populations (86 individuals). We observed the 2:2:2 pattern (NCF1B:NCF1:NCF1C) in only 26 individuals. On average, each African- American has 1.4 ± 0.8 (Mean ± SD) copies of NCF1B and 2.3 ± 0.6 copies of NCF1C; each Caucasian has 1.8 ± 0.7 copies of NCF1B and 1.9 ± 0.4 copies of NCF1C; and each Mexican has 1.6 ± 0.6 copies of NCF1B and 1.0 ± 0.4 copies of NCF1C. Mexicans have significantly less NCF1C copies than African-Americans (p = 6e-15) and Caucasians (p = 3e-11). Mendelian transmission of this CNV was observed in two CEPH pedigrees. Moreover, we cloned two alternative spliced transcripts generated from these two pseudogenes that adopt alternative exon-2 instead of their defective exon 2. The NCF1 pseudogene expression responded robustly to PMA induction during macrophage differentiation. NCF1B decreased from 32.9% to 8.3% in the cDNA pool transcribed from 3 gene copies. NCF1Ψs also displayed distinct expression patterns in different human tissues.
Our results suggest that these two pseudogenes may adopt an alternative exon-2 in different tissues and in response to external stimuli. The GT deletion is insufficient to define them as functionless pseudogenes; this CNV may have biological relevance.
Recent genomic studies suggested that gene duplication occurred frequently and in variable numbers during the recent history of human populations, which has led to de novo formations of copy number variation (CNV) . Presumably due to positive selection, genes encoding certain protein categories are particularly enriched in CNVs, such as those involved in processes related to environmental responses [1–8]. In this process, duplicated genes are thought to be the "successful" copies; pseudogenes are those "unsuccessful" duplicates retained in the genome .
Neutrophil cytosolic factor 1 (NCF1, also called p47 phox , for phagocyte oxidase), is a crucial component of NADPH oxidase . This enzyme catalyzes the production of microbicidal superoxide in phagocytes such as neutrophil and plays a vital role in host defense against microbial pathogens [10, 11]. A 2-bp GT deletion in exon 2 of the NCF1 gene causes chronic granulomatous disease (CGD) in humans [12, 13]. NCF1 is expressed in many cell types and may play a role in many other diseases [14–19].
The human NCF1 gene is located at 7q11.23, the Williams Beurens Syndrome region [20–22], accompanied by two nearly identical (>99.5%) pseudogenes (NCF1B and NCF1C) which presumably arose by gene duplication . These two pseudogenes have the same signature sequence as the one in the NCF1 gene responsible for CGD, the 2-bp GT deletion in exon 2 [13, 23, 24]. This mutation leads to a frameshift and a premature stop codon and thus these two gene duplicates were categorized as pseudogenes. It has been noticed that the NCF1Ψ/NCF1 ratio vary in human individuals, and it was believed some NCF1Ψ gene copies contain the wild-type NCF1 exon 2 sequence [22, 25].
In this study, we examined the copy numbers of NCF1 pseudogenes in human populations and found a copy number variation (CNV). Our additional data revealed that these two pseudogenes can generate RNA transcripts that skip over their defective exon 2 by alternative splicing.
To determine the relative copy numbers of NCF1B, NCF1, and NCF1C, we genotyped the genomic DNA of human subjects at two particular positions, the signature 2-bp GT deletion in exon 2, and an A/G substitution in exon-9, in which NCF1B and NCF1 has an A allele, and NCF1C has a G allele (Figure 1 and Additional file 1). Pyrosequencing is a high-throughput technology that can be used for accurate determination of the allele frequency in pooled DNA . Based on the pyrogram peak heights, we assessed the allele composition of each individual with the PyroMarkID Software v1.0 (Additional file 2).
Using the genotyping pyrograms at exon 2 (GT/GTGT/GT) [NCF1B/NCF1/NCF1C] and exon 9 (A/A/G), we were able to further dissect the copy numbers of two pseudogenes in each genome (see Additional file 3). On average, each AA individual has 1.4 ± 0.8 (Mean ± SD) copies of NCF1B, 2.1 ± 0.7 copies of NCF1, and 2.3 ± 0.6 copies of NCF1C; each Caucasian has 1.8 ± 0.7 copies of NCF1B, 2.1 ± 0.3 copies of NCF1, and 1.9 ± 0.4 copies of NCF1C; and each Mexican genome has 1.6 ± 0.6 copies of NCF1B, 2.1 ± 0.3 copies of NCF1, and 1.0 ± 0.4 copies of NCF1C. There is a significant difference on the copy number of NCF1C among these 3 human populations (Figure 3c and 3d), in which Mexicans have significantly less copies of NCF1C than AA (p = 6e-15) and Cau (p = 3e-11). There is also a significant difference on the copy number of NCF1B between Cau and AA (p = 0.033).
Epstein-Barr virus (EBV) transformed lymphoblastoid cells lines are widely used as a genomic resource for many human genetic studies. However, chromosomal instability, which can cause a duplication or deletion in the host or viral genome sequences flanking the integration sites, is increased by viral integration . In order to eliminate the possibility that this CNV is just an artifact in lymphoblastoid cell lines, we analyzed 48 genomic DNA samples directly extracted from human peripheral white blood cells (Additional file 4). Collectively 4 individuals had 5:2 ratios (2.41+ 0.058), 18 individuals had 2:1 ratios (1.97+ 0.141), 23 individuals had 3:2 ratios (1.53+ 0.141) and 3 individuals had 1:1 ratios. This data confirmed the presence of this CNV.
In this study, we report the existence of a NCF1 pseudogene CNV in human. The pseudogene copy numbers are apparently different among three human populations (African-Americans, Caucasians, and Mexicans). The CNV existence is validated by observance in genomic DNA extracted directly from human peripheral white blood cells. The NCF1 CNV inheritance found in the two family pedigrees, suggest that the chromosomal instability at this locus may not be high. Our phylogenetic analysis implies the existence of the NCF1 gene prior to the divergence of those two pseudogene duplicates, the NCF1 pseudogenes may emerge after the divergence of human and chimpanzee lineages. These data suggest that this NCF1 pseudogene CNV may be a consequence of recent gene duplications in human history .
Pseudogenes have long been considered to be 'dead' nonfunctional byproducts of genome evolution . They were defined as genomic sequences that are similar to a functional gene but contain genetic defects that preclude the generation of functional products [30–32]. Recent findings have prompted to revise the definition of pseudogenes, which are now defined as genomic sequences that arise from functional genes but cannot encode the same type of functional products (i.e. protein, tRNA or rRNA) as the original genes . Human genome is estimated to contain ~20,000 pseudogenes , it will be important to know how many and which pseudogenes are functional. Recently emphasis has been placed on polymorphisms such as CNV that has been documented to play a role in disease pathogenesis [34–39]. It has not been reported that a CNV of a pseudogene is biologically relevant. Historically, the NCF1 pseudogenes are considered "pseudo" because of their 2-bp GT deletion in exon 2, which is predicted to cause a frameshift and an early stop codon in protein synthesis. In our analysis of GT-containing transcript, the pseudogenes were far less active. However, our data revealed for the first time that a portion of NCF1 pseudogene transcripts do not utilize their defective exon-2, instead, they may use alternative exons to skip over their mutant exon-2 (Figure 6). When we measured transcript exclusive to exon-2, the NCF1 pseudogenes had varying transcriptional capacities that responded robustly to PMA induced macrophage differentiation (Figure 7) and showed distinct expression patterns in different human tissues (Figure 8). These observations prompted us to wonder if these NCF1 pseudogenes are "functional" or "unsuccessful duplicates". Recent studies have showed that ~95% of multi-exon genes undergo tissue-specific alternative splicing [40–42]; on the other hand, genes can also function by making regulatory non-coding RNAs in addition to making proteins . These results certainly challenge the current perception of the NCF1 pseudogenes.
We observed differential expression of NCF1 and its pseudogenes and a varying contribution of the genes to the total transcript pool (Figure 5, 7, and 8). It may be caused by differential alternative splicing patterns, different promoter activities, and/or differential mRNA degradation. As indicated by sequence alignment with ClustalW2, the 5' untranslated regions of the NCF1 pseudogenes are nearly identical to the equivalent region of NCF1 (Additional file 7). Collectively, putative binding sites of 45 transcription factors (TFBS) were predicted with rVISTA (Additional file 7). The number and exact locations of these TFBS are also nearly identical among these three genes (Additional file 7), except that the NCF1 gene contains 19 AML1 TFBS and 4 CREB TFBS instead of the 18 AML1 sites and 3 CREB sites in both pseudogenes. Our data suggest that a portion of pseudogene transcripts did not adopt the GT-containing exon-2, which may explain our observation that NCF1 pseudogenes displayed fewer GT-containing transcripts relative to the true NCF1 gene. However, we are unclear if differential promoter activities and mRNA degradation contribute to this observation.
The CGD patients have shown that NCF1 is essential in the function of the neutrophil in the first line of host defense against many pathogenic bacteria and fungi. About 93% of humans patients caused by NCF1 mutations are homozygous for the 2-bp GT deletion [12, 13]. Obviously the NCF1 pseudogenes, are not simply replacement duplicates, otherwise they may have compensated for the loss of NCF1 gene copies in the CGD patients. Two recent studies have reported that less copies of NCF1 pseudogenes may produce more reactive oxygen intermediates  and may exaggerate certain diseases involving inflammatory process such as inflammatory bowel disease . Therefore, these two NCF1 pseudogenes may produce protein isoforms or small RNAs that act as inhibitors of the normal NCF1 function. Superoxide is a double-edged sword, it is essential for phagocytes to exert their bactericidal function, but in excess it is also toxic to our own cells. Cells in other tissues would benefit from the existence of these two pseudogenes if they could recognize different stimuli and then reduce the NCF1/NADPH oxidase activity and superoxide production accordingly. For example, they may help to reduce the inflammatory process during atherosclerosis.
Taken together this study reported the existence of an NCF1 pseudogene CNV in three normal populations. These NCF1 pseudogenes are actively transcribed in human tissues. They can produce transcripts using alternative exon-2 instead of their defective exon-2. These results prompt us to re-consider their non-functional status in pseudogenes classification . The functional contribution of the NCF1 pseudogenes to NCF1 function and the pathological relevance of this copy number variation will merit further investigations.
Genomic DNAs of 86 unrelated individual DNA samples (The Human Variation Panels, 32 African-Americans [AA], 30 Caucasians [Cau], and 24 Mexicans [Mex]) and 2 CEPH/UTAH pedigrees (1331 and 1362) were obtained from The Coriell Cell Repositories. Single-donor Epstein-Barr virus (EBV)-immortalized lymphoblastoid cell lines (LCLs) were obtained from two sources: The Coriell Cell Repositories and The Emory Zafari Collection (kindly provided by Dr. Zafari of Emory Cardiology) . Genomic DNAs of 48 unrelated healthy individuals directly extracted from peripheral blood cells instead of LCLs were kindly provided by Dr. Kittner of University of Maryland . A human monocyte cell line THP-1 was purchased from ATCC (TIB-202). Total RNAs from human tissues were purchased from Clontech and US Biological. Phorbol 12-myristate 13-acetate (PMA) was purchased from Krackler Scientific (Cat. No. P1585). All subjects provided informed consent, and the study was approved by the Institutional Review Board of Morehouse School of Medicine.
Lymphoblastoid cells and THP-1 cells were grown in RMPI-1640 supplemented with 10% fetal bovine serum (FBS), 2 mM L-glutamine, and 1% penicillin/streptomycin. Cells were maintained in a humidified atmosphere containing 5% CO2 at 37°C. The differentiation of THP-1 monocytes into macrophages was induced by incubation in 800 ng/ml of PMA for 12 hours. The non-adherent cells were removed by aspiration; adherent cells were allowed to differentiate for 3-4 days.
Genomic DNA was extracted from lymphoblastoid cells following a standard protocol described previously . Total RNA was isolated using the RNeasy Mini kit (Qiagen). First strand cDNA was generated with 1 μg of total RNA using the Superscript III-RT system (Invitrogen).
Genomic DNA and cDNA were subjected to genotyping at the 2-bp GT deletion in Exon-2 and a 1-bp difference in Exon-9 (Figure 1). PCR and pyrosequencing primers are provided in Additional file 8. These primers were designed to recognize both NCF1 and its pseudogenes without discrimination. The PCR products were examined by agarose gel electrophoresis to ensure PCR specificity. Genotyping was carried out with a pyrosequencer (PSQ96MA, Pyrosequencing, Uppsala, Sweden) following the manufacturer's protocol. Briefly, a 20 μL PCR reaction was carried out with either genomic DNA or cDNA. PCR products were then denatured and single-strand DNA templates were collected with streptavidin-coated Dynabeads (Dynal, Oslo, Norway). A pyrosequencing primer was added and pyrosequencing was performed in an automated PSQ96MA instrument. The pyrogram peak heights were evaluated per individual to give an accurate count of allele ratios using the PyroMarkID Software v1.0 provided by the manufacturer.
RT-qPCR was carried out using a LightCycler thermocycler (Roche) and a SYBR green kit (Roche). Oligo-dT primers were used in the reverse transcription, their sequences are provided in Additional file 8. Cycle numbers obtained at the log-linear phase of the reaction were plotted against a standard curve prepared with serially diluted control samples. Expressions of target genes were normalized by GAPDH and β-actin levels. The RT-qPCR amplification specificity of the NCF1-specific primer set is shown in Additional file 9.
All data is expressed as Mean ± Standard Deviation (SD). Student's t test was used in the setting of multiple comparisons where the appropriate and statistical significance is defined as p ≤ 0.05.
Nucleotide sequences were retrieved from The UCSC Genome Browser (Human Assembly 2006 March). Sequence alignments were performed using multiple sequence alignment with MAP http://www.ebi.ac.uk/Tools/clustalw2/index.html. Phylogenetic trees were constructed using MegAlign ClustalW of DNAStar Software
The authors would like to take the opportunity to thank Gary H. Gibbons, Maziar Zafari, Sandra Harris-Hooker, Mukaila Akinbami, David B. Allison, and Kathy K. Griendling Taylor for their scientific comments. This work was supported by grants of American Heart Association (09GRNT2300003) and NIH (NIH/NHLBI T32HL067702, HL003676, HL095098, NIH/NCRR RR014758 and RR003034, NIH/NIGMS HL095098). The research was conducted in a facility constructed with support from a Research Facilities Improvement Grant (NIH/NCRR RR07571). This work is also supported in part by the Baltimore Research Enhancement Award Program in Stroke and the Baltimore Geriatrics Research, Education, and Clinical Center of the Department of Veterans Affairs.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.