A copy number variation in human NCF1 and its pseudogenes

Background Neutrophil cytosolic factor-1 (NCF1) is a component of NADPH oxidase. The NCF1 gene colocalizes with two pseudogenes (NCF1B and NCF1C). These two pseudogenes have a GT deletion in exon 2, resulting in a frameshift and an early stop codon. Here, we report a copy number variation (CNV) of the NCF1 pseudogenes and their alternative spliced expressions. Results We examined three normal populations (86 individuals). We observed the 2:2:2 pattern (NCF1B:NCF1:NCF1C) in only 26 individuals. On average, each African- American has 1.4 ± 0.8 (Mean ± SD) copies of NCF1B and 2.3 ± 0.6 copies of NCF1C; each Caucasian has 1.8 ± 0.7 copies of NCF1B and 1.9 ± 0.4 copies of NCF1C; and each Mexican has 1.6 ± 0.6 copies of NCF1B and 1.0 ± 0.4 copies of NCF1C. Mexicans have significantly less NCF1C copies than African-Americans (p = 6e-15) and Caucasians (p = 3e-11). Mendelian transmission of this CNV was observed in two CEPH pedigrees. Moreover, we cloned two alternative spliced transcripts generated from these two pseudogenes that adopt alternative exon-2 instead of their defective exon 2. The NCF1 pseudogene expression responded robustly to PMA induction during macrophage differentiation. NCF1B decreased from 32.9% to 8.3% in the cDNA pool transcribed from 3 gene copies. NCF1Ψs also displayed distinct expression patterns in different human tissues. Conclusions Our results suggest that these two pseudogenes may adopt an alternative exon-2 in different tissues and in response to external stimuli. The GT deletion is insufficient to define them as functionless pseudogenes; this CNV may have biological relevance.


Background
Recent genomic studies suggested that gene duplication occurred frequently and in variable numbers during the recent history of human populations, which has led to de novo formations of copy number variation (CNV) [1]. Presumably due to positive selection, genes encoding certain protein categories are particularly enriched in CNVs, such as those involved in processes related to environmental responses [1][2][3][4][5][6][7][8]. In this process, duplicated genes are thought to be the "successful" copies; pseudogenes are those "unsuccessful" duplicates retained in the genome [1].
Neutrophil cytosolic factor 1 (NCF1, also called p47 phox , for phagocyte oxidase), is a crucial component of NADPH oxidase [9]. This enzyme catalyzes the production of microbicidal superoxide in phagocytes such as neutrophil and plays a vital role in host defense against microbial pathogens [10,11]. A 2-bp GT deletion in exon 2 of the NCF1 gene causes chronic granulomatous disease (CGD) in humans [12,13]. NCF1 is expressed in many cell types and may play a role in many other diseases [14][15][16][17][18][19].
The human NCF1 gene is located at 7q11.23, the Williams Beurens Syndrome region [20][21][22], accompanied by two nearly identical (>99.5%) pseudogenes (NCF1B and NCF1C) which presumably arose by gene duplication [23]. These two pseudogenes have the same signature sequence as the one in the NCF1 gene responsible for CGD, the 2-bp GT deletion in exon 2 [13,23,24]. This mutation leads to a frameshift and a premature stop codon and thus these two gene duplicates were categorized as pseudogenes. It has been noticed that the NCF1Ψ/NCF1 ratio vary in human individuals, and it was believed some NCF1Ψ gene copies contain the wild-type NCF1 exon 2 sequence [22,25].
In this study, we examined the copy numbers of NCF1 pseudogenes in human populations and found a copy number variation (CNV). Our additional data revealed that these two pseudogenes can generate RNA transcripts that skip over their defective exon 2 by alternative splicing.

Copy Number Variation of NCF1 and NCF1Ψ
There are three large highly homologous duplicons at the NCF1 locus ( Figure 1). According to their chromosomal positions ( , we designated two NCF1 pseudogene duplicons as NCF1B and NCF1C, respectively. These duplicons share a 106-kb sequence with >99.5% similarities spanning from -45 kb at 5'-end to +46 kb at 3'-end regarding the NCF1 coding region. NCF1 or NCF1C duplicons share an additional 3'-flanking sequence until +82 kb ( Figure 1). In contrast to the human genome, NCF1 is a single-copy gene in the reference genomes of all other species, such as chimpanzee, rhesus monkey, rat and mouse (UCSC Genome Assemblies). The phylogenetic tree revealed that NCF1B and NCF1C duplicated after they arose from the NCF1 gene ( Figure 2).
To determine the relative copy numbers of NCF1B, NCF1, and NCF1C, we genotyped the genomic DNA of human subjects at two particular positions, the signature 2-bp GT deletion in exon 2, and an A/G substitution in exon-9, in which NCF1B and NCF1 has an A allele, and NCF1C has a G allele (Figure 1 and Additional file 1). Pyrosequencing is a high-throughput technology that can be used for accurate determination of the allele frequency in pooled DNA [26]. Based on the pyrogram peak heights, we assessed the allele composition of each individual with the PyroMarkID Software v1.0 (Additional file 2).
Epstein-Barr virus (EBV) transformed lymphoblastoid cells lines are widely used as a genomic resource for many human genetic studies. However, chromosomal instability, which can cause a duplication or deletion in the host or viral genome sequences flanking the integration sites, is increased by viral integration [27]. In order to eliminate the possibility that this CNV is just an artifact in lymphoblastoid cell lines, we analyzed 48 genomic DNA samples directly extracted from human Figure 1 Genomic organization of the NCF1 gene locus at 7q11.23 in the human genome. There are 3 large duplicons (arrow blocks) at this locus; the arrow directions illustrate the NCF1 or NCF1Ψ transcription orientations. According to their genomic positions, we designated those two pseudogene duplicons as NCF1B and NCF1C. These duplicons share a 106-kb sequence with >99.5% similarities spanning from -45 kb at 5'-end to +46 kb at 3'-end regarding the NCF1/NCF1Ψ coding region. NCF1 and NCF1B duplicons share an additional 3'-flanking sequence until +82 kb. The signature difference between NCF1 and its pseudogenes, the 2-bp GT deletion, is showed by GTGT at NCF1 and GT at NCF1B and NCF1C. Another nucleotide difference between duplicons, an A/G substitution in exon-9, is also indicated above the arrow blocks. The transposons at the boundaries of these duplicons are indicated.

Heritability of NCF1 Copy Number Variation
Two large CEPH families were used to examine the heritability of this CNV detected at the NCF1 locus ( Figure 4). The majority of the members in family 1331 have a 2:1 ΔGT/GTGT ratio as represented by a 4:2 ratio in a diploid genome, which was further determined as a 1:2:3 proportion (NCF1B: NCF1: NCF1C); one family member, the paternal grandmother, has a 0:2:3 ratio. Family 1362 is particularly interesting as both maternal and paternal lineages have very different NCF1B:NCF1:NCF1C ratios. The paternal lineage has an unvarying 4:2 ratio (1:2:3) where as the maternal lineage has a continuous 2:2 (0:2:2) ratio. Based on the potential haplogenotypes deduced from the pedigrees, we  observed a clear allele transmission pattern of this CNV in both families, suggesting a heritability of this CNV; however, our data cannot exclude the de novo formation of new copy numbers of this CNV in other families. The clear inheritance of the copy numbers in these two large pedigrees also suggests that there is no large experimental bias on the copy number measurement.

Transcription and Alternative Splicing of NCF1Ψ and NCF1 Genes
We explored if these two NCF1 pseudogenes are transcriptionally active. Pyrosequencing was used to quantify the NCF1/NCF1Ψs compositions in the mRNAs of 14 single-donor lymphoblastoid cell lines by genotyping the signature 2-bp GT deletion in cDNA ( Figure 5). Interestingly, although NCF1Ψs have more copies than NCF1 in each individual, they made much less GT-containing transcripts than NCF1. For example, in the individual-6 who has 4 copies of NCF1Ψs and 2 copies of NCF1, pseudogenes collectively only contributed the amount of transcripts equal to half of the amount from the NCF1 gene copy (as revealed by the ratio 0.5:1 in cDNA).
By PCR, cloning and direct DNA sequencing, we have experimentally discovered two novel alternative exons (GenBank: GU215077, GU215078) located in the intron-1 (Additional file 5). Neither of these two new alternative splicing transcripts used the GT-containing exon-2 (Figure 6b), and both of them were made from the NCF1Ψs copies as revealed by their nucleotide sequences (Figure 6c). We have searched for putative open reading frames with the NCBI ORF Finder. Sub1 showed a continuous ORF without a stop codon, thus the full-length transcript containing this alternative splicing pattern may produce a long ORF. Sub2 contains Figure 4 Inheritance of the NCF1/NCF1Ψ copy numbers in two CEPH families. A NCF1Ψ/NCF1 copy number ratio is indicated within each shape; haplogenotypes are indicated below each shape. In haplogenotypes, blue circles represent the NCF1 duplicons, grey circles represent the NCF1Ψ duplicons (NCF1B on the left arm of the NCF1 blue circle on the haplogenotype, NCF1C on the right arm of NCF1), and an X indicates an absence of NCF1 and NCF1Ψ copy. three predicted ORF, but all have a stop codon before its last nucleotide (Additional file 6).

NCF1Ψ and NCF1 Transcription in Monocyte Differentiation
Monocytes/Macrophages have been implicated in atherosclerosis [28]. After treatment with 800 ng/ml PMA for 12 hrs, the monocytes became adherent and acquired a macrophage-like phenotype. Both monocytes and macrophages have NCF1B:NCF1:NCF1C genomic copy number ratios of 2:2:2 ( Figure 7a). After macrophage differentiation, quantitative RT-PCR experiments (RT-qPCR) were performed to measure the transcripts with the "GT-containing" exon 2 using primers that recognize all three NCF1 genes. The results showed that the NCF1/NCF1Ψs total GT-containing expression is slightly upregulated 1.34-fold (GAPDH, p = 0.024) to 1.82 fold (β-actin, p = 0.006) (Figure 7b). However, the relative contributions of each pseudogenes and the NCF1 gene copy to the GT-containing transcript pool were altered dramatically. NCF1B decreased from 32.9% to 8.3%, whereas NCF1 increased from 52.7% to 80.4% (Figure 7c).

NCF1Ψ Expressions in Human Tissues
We measured the NCF1B:NCF1:NCF1C transcript ratios by pyrosequencing the signature GT deletion in cDNAs generated from different human tissues. We observed that NCF1B and NCF1C expressions (GT-containing transcripts) varied dramatically in different human tissues ( Figure 8). For example, skin produced the highest contribution of NCF1B relative to NCF1 expression, whereas pancreas produced the least amount of NCF1B. Spleen and lymph node produced the highest contribution of NCF1B, whereas lung and brain produced the least amount of NCF1C.

Discussion
In this study, we report the existence of a NCF1 pseudogene CNV in human. The pseudogene copy numbers are apparently different among three human populations (African-Americans, Caucasians, and Mexicans). The CNV existence is validated by observance in genomic DNA extracted directly from human peripheral white blood cells. The NCF1 CNV inheritance found in the two family pedigrees, suggest that the chromosomal instability at this locus may not be high. Our phylogenetic analysis Figure 6 Alternative splicing of the NCF1 pseudogenes. a) RT-PCR detection of two novel pseudogene-specific exons. b) The alternative splicing pattern (sub1 and sub2) using these two new exons, both splicing isoforms skip over the defective exon 2. c) Sequencing electropherogram of sub1 and sub2, the single nucleotide substitutions between NCF1 and its pseudogenes revealed that these two alternative exons were transcribed from the pseudogene copies. HEK, human embryonic kidney cell line; THP-1, a monocyte cell line; HASM, human aortic smooth muscle cells; LCL, lymphoblastoid cell line. implies the existence of the NCF1 gene prior to the divergence of those two pseudogene duplicates, the NCF1 pseudogenes may emerge after the divergence of human and chimpanzee lineages. These data suggest that this NCF1 pseudogene CNV may be a consequence of recent gene duplications in human history [1].
Pseudogenes have long been considered to be 'dead' nonfunctional byproducts of genome evolution [29]. They were defined as genomic sequences that are similar to a functional gene but contain genetic defects that preclude the generation of functional products [30][31][32]. Recent findings have prompted to revise the definition of pseudogenes, which are now defined as genomic sequences that arise from functional genes but cannot encode the same type of functional products (i.e. protein, tRNA or rRNA) as the original genes [29]. Human genome is estimated to contain~20,000 pseudogenes [33], it will be important to know how many and which pseudogenes are functional. Recently emphasis has been placed on polymorphisms such as CNV that has been documented to play a role in disease pathogenesis [34][35][36][37][38][39]. It has not been reported that a CNV of a pseudogene is biologically relevant. Historically, the NCF1 pseudogenes are considered "pseudo" because of their 2-bp GT deletion in exon 2, which is predicted to cause a frameshift and an early stop codon in protein synthesis. In our analysis of GT-containing transcript, the pseudogenes were far less active. However, our data revealed for the first time that a portion of NCF1 pseudogene transcripts do not utilize their defective exon-2, instead, they may use alternative exons to skip over their mutant exon-2 ( Figure 6). When we measured transcript exclusive to exon-2, the NCF1 pseudogenes had varying transcriptional capacities that responded robustly to PMA induced macrophage differentiation ( Figure 7) and showed distinct expression patterns in different human tissues (Figure 8). These observations prompted us to wonder if these NCF1 pseudogenes are "functional" or "unsuccessful duplicates". Recent studies have showed that~95% of multi-exon genes undergo tissue-specific Figure 7 The response of NCF1Ψ/NCF1 gene expression ratio to macrophage differentiation. a) Pyrosequencing was carried out to measure the ratio of NCF1B, NCF1 and NCF1C copies in the genomic DNA of THP-1 and differentiated macrophages. b) RT-qPCR was performed to measure the NCF1Ψ/NCF1 gene expression level using primers that recognize the GT or GTGT containing transcripts from all three gene copies. Five independent experiments were performed in duplicates using both GAPDH and β-actin as a reference. P values are indicated. c) Pyrosequencing was carried out to measure the relative contributions of NCF1 and its pseudogenes to the GTGT or GT containing transcript pool during macrophage differentiation.
alternative splicing [40][41][42]; on the other hand, genes can also function by making regulatory non-coding RNAs in addition to making proteins [43]. These results certainly challenge the current perception of the NCF1 pseudogenes.
We observed differential expression of NCF1 and its pseudogenes and a varying contribution of the genes to the total transcript pool ( Figure 5, 7, and 8). It may be caused by differential alternative splicing patterns, different promoter activities, and/or differential mRNA degradation. As indicated by sequence alignment with ClustalW2, the 5' untranslated regions of the NCF1 pseudogenes are nearly identical to the equivalent region of NCF1 (Additional file 7). Collectively, putative binding sites of 45 transcription factors (TFBS) were predicted with rVISTA (Additional file 7). The number and exact locations of these TFBS are also nearly identical among these three genes (Additional file 7), except that the NCF1 gene contains 19 AML1 TFBS and 4 CREB TFBS instead of the 18 AML1 sites and 3 CREB sites in both pseudogenes. Our data suggest that a portion of pseudogene transcripts did not adopt the GT-containing exon-2, which may explain our observation that NCF1 pseudogenes displayed fewer GT-containing transcripts relative to the true NCF1 gene. However, we are unclear if differential promoter activities and mRNA degradation contribute to this observation.
The CGD patients have shown that NCF1 is essential in the function of the neutrophil in the first line of host defense against many pathogenic bacteria and fungi. About 93% of humans patients caused by NCF1 mutations are homozygous for the 2-bp GT deletion [12,13]. Obviously the NCF1 pseudogenes, are not simply replacement duplicates, otherwise they may have compensated for the loss of NCF1 gene copies in the CGD patients. Two recent studies have reported that less copies of NCF1 pseudogenes may produce more reactive oxygen intermediates [44] and may exaggerate certain diseases involving inflammatory process such as inflammatory bowel disease [45]. Therefore, these two NCF1 pseudogenes may produce protein isoforms or small RNAs that act as inhibitors of the normal NCF1 function. Superoxide is a double-edged sword, it is essential for phagocytes to exert their bactericidal function, but in excess it is also toxic to our own cells. Cells in other tissues would benefit from the existence of these two pseudogenes if they could recognize different stimuli and then reduce the NCF1/NADPH oxidase activity and superoxide production accordingly. For example, they may help to reduce the inflammatory process during atherosclerosis.

Conclusions
Taken together this study reported the existence of an NCF1 pseudogene CNV in three normal populations. These NCF1 pseudogenes are actively transcribed in human tissues. They can produce transcripts using alternative exon-2 instead of their defective exon-2. These results prompt us to re-consider their non-functional status in pseudogenes classification [29]. The functional contribution of the NCF1 pseudogenes to NCF1 function and the pathological relevance of this copy number variation will merit further investigations.  [46]. Genomic DNAs of 48 unrelated healthy individuals directly extracted from peripheral blood cells instead of LCLs were kindly provided by Dr. Kittner of University of Maryland [47]. A human monocyte cell line THP-1 was purchased from ATCC (TIB-202). Total RNAs from human tissues were purchased from Clontech and US Biological. Phorbol 12-myristate 13-acetate (PMA) was purchased from Krackler Scientific (Cat. No. P1585). All subjects provided informed consent, and the study was approved by the Institutional Review Board of Morehouse School of Medicine.

Cell Culture
Lymphoblastoid cells and THP-1 cells were grown in RMPI-1640 supplemented with 10% fetal bovine serum (FBS), 2 mM L-glutamine, and 1% penicillin/streptomycin. Cells were maintained in a humidified atmosphere containing 5% CO2 at 37°C. The differentiation of THP-1 monocytes into macrophages was induced by incubation in 800 ng/ml of PMA for 12 hours. The non-adherent cells were removed by aspiration; adherent cells were allowed to differentiate for 3-4 days.

Genomic DNA and Total RNA Extraction
Genomic DNA was extracted from lymphoblastoid cells following a standard protocol described previously [48]. Total RNA was isolated using the RNeasy Mini kit (Qiagen). First strand cDNA was generated with 1 μg of total RNA using the Superscript III-RT system (Invitrogen).

Genotyping
Genomic DNA and cDNA were subjected to genotyping at the 2-bp GT deletion in Exon-2 and a 1-bp difference in Exon-9 ( Figure 1). PCR and pyrosequencing primers are provided in Additional file 8. These primers were designed to recognize both NCF1 and its pseudogenes without discrimination. The PCR products were examined by agarose gel electrophoresis to ensure PCR specificity. Genotyping was carried out with a pyrosequencer (PSQ96MA, Pyrosequencing, Uppsala, Sweden) following the manufacturer's protocol. Briefly, a 20 μL PCR reaction was carried out with either genomic DNA or cDNA. PCR products were then denatured and singlestrand DNA templates were collected with streptavidincoated Dynabeads (Dynal, Oslo, Norway). A pyrosequencing primer was added and pyrosequencing was performed in an automated PSQ96MA instrument. The pyrogram peak heights were evaluated per individual to give an accurate count of allele ratios using the Pyro-MarkID Software v1.0 provided by the manufacturer.

Quantitative Real-Time PCR (RT-qPCR)
RT-qPCR was carried out using a LightCycler thermocycler (Roche) and a SYBR green kit (Roche). Oligo-dT primers were used in the reverse transcription, their sequences are provided in Additional file 8. Cycle numbers obtained at the log-linear phase of the reaction were plotted against a standard curve prepared with serially diluted control samples. Expressions of target genes were normalized by GAPDH and β-actin levels. The RT-qPCR amplification specificity of the NCF1-specific primer set is shown in Additional file 9.

Statistical Analysis
All data is expressed as Mean ± Standard Deviation (SD). Student's t test was used in the setting of multiple comparisons where the appropriate and statistical significance is defined as p ≤ 0.05.

Bioinformatic analysis
Nucleotide sequences were retrieved from The UCSC Genome Browser (Human Assembly 2006 March). Sequence alignments were performed using multiple sequence alignment with MAP http://www.ebi.ac.uk/ Tools/clustalw2/index.html. Phylogenetic trees were constructed using MegAlign ClustalW of DNAStar Software