Gene cassette transcription in a large integron-associated array
© Michael and Labbate. 2010
Received: 25 April 2010
Accepted: 15 September 2010
Published: 15 September 2010
Skip to main content
© Michael and Labbate. 2010
Received: 25 April 2010
Accepted: 15 September 2010
Published: 15 September 2010
The integron/gene cassette system is a diverse and effective adaptive resource for prokaryotes. Short cassette arrays, with less than 10 cassettes adjacent to an integron, provide this resource through the expression of cassette-associated genes by an integron-borne promoter. However, the advantage provided by large arrays containing hundreds of cassettes is less obvious. In this work, using the 116-cassette array of Vibrio sp. DAT722 as a model, we investigated the theory that the majority of genes contained within large cassette arrays are widely expressed by intra-array promoters in addition to the integron-borne promoter.
We demonstrated that the majority of the cassette-associated genes in the subject array were expressed. We further showed that cassette expression was conditional and that the conditionality varied across the array. We finally showed that this expression was mediated by a diversity of cassette-borne promoters within the array capable of responding to environmental stressors.
Widespread expression within large gene cassette arrays could provide an adaptive advantage to the host in proportion to the size of the array. Our findings explained the existence and maintenance of large cassette arrays within many prokaryotes. Further, we suggested that repeated rearrangement of cassettes containing genes and/or promoters within large arrays could result in the assembly of operon-like groups of co-expressed cassettes within an array. These findings add to our understanding of the adaptive repertoire of the integron/gene cassette system in prokaryotes and consequently, the evolutionary impact of this system.
Integrons are genetic elements capable of mobilising and rearranging genes packaged as mobile gene cassettes in a site-specific manner . In concert with other mechanisms capable of mobilising DNA between cells, the integron/gene cassette system contributes to the overall process of lateral gene transfer (LGT). LGT is a major contributor to genetic diversity amongst prokaryotes  and hence a significant force in prokaryote evolution. The ability of the integron/gene cassette system in particular, to influence the evolution of prokaryote strains is graphically shown in the rapid dissemination of antibiotic resistance genes both geographically and amongst different prokaryotes . However, cassette-associated genes are not limited to the provision of antibiotic resistance phenotypes, with a plethora of novel ORFs (open reading frames or putative genes) present in the gene cassette metagenome . While the majority of the ORFs contained within gene cassettes have no analogue in sequencing databases, those few for which a function has been attributed, have been adaptive in nature [5–7].
An integron typically consists of a integrase gene (intI), its associated promoter Pint,, an attI site which acts both as a recognition site for the integrase produced by intI and an insertion site for gene cassettes and a second promoter, Pc, located within the intI gene but oriented towards attC and any adjacent gene cassette array. The DNA segments mobilised by integrons are termed gene cassettes. Gene cassettes typically consist of an ORF, closely bounded by a multifunction site termed attC. AttC, analogous to attI in the integron, serves as both an integrase recognition and recombination site. AttC, through its imperfect symmetry, also serves to orient inserted cassettes and their contained ORFs, uniformly with respect to the adjacent integron and consequently, the integron-borne promoter Pc .
Site-specific recombination catalysed by the integrase, IntI, causes gene cassettes to be inserted either at attI or attC sites. Successive rounds of recombination can introduce new cassettes and so generate tandem cassette arrays adjacent to the integron. In addition, repeated recombination may also rearrange cassettes within an array. Cassette arrays can vary in length from lone integrons and the typically short (1-8 cassettes) antibiotic resistant arrays seen associated with class 1 integrons, to over 200 tandem cassettes in a single array seen in the vibrio [9, 10].
Class 1 integrons were initially characterised through the antibiotic resistance phenotypes conferred through the expression of cassette-associated genes. The expression of these genes has been shown to be due to the integron associated Pc . However, it is very unlikely that Pc could mediate the expression of all cassettes present in arrays containing significantly more than the 7-10 cassettes typically seen in class 1 integrons, due to the extreme length of the mRNA transcript required. Therefore, Pc mediated expression alone may not account for the selective advantage provided by the presence of large gene cassette arrays containing hundreds of cassettes. It has been hypothesised that in such large arrays, only those cassettes proximal to the integron are expressed, and that the remainder of cassettes in the array are 'banked', forming an accessible population resource of mobile genes . This hypothesis may be supported by the observation that under stress, the integrase gene intI, is up-regulated . Such increased integrase activity could not only introduce or excise cassettes from the array, but also rearrange existing cassettes, so bringing previously distal cassettes closer to Pc and hence facilitating their expression. We however, speculated that cassettes throughout large arrays were generally expressed through the presence of promoters within the array as suggested by micro-array data .
Resolving the question of the expression of cassette associated genes in large arrays is important in extending our understanding of the adaptive potential of gene cassettes arrays, and by extension the way in which LGT can provide varied phenotypes in prokaryote populations. In order to investigate cassette expression in large arrays, we used Vibrio sp. DAT722 as a model system, Vibrio sp. DAT722 is a weak pathogen of crustaceans and contains an integron with an attendant array of 116 gene cassettes. This array having been previously sequenced and annotated enabled a detailed examination of the differential expression of cassette-associated genes to be undertaken . Therefore, in this work, the following questions were addressed:
1/ Are only cassettes proximal to the integron in Vibrio sp. DAT 722 expressed?
2/ If cassettes throughout the Vibrio sp. DAT722 array are expressed, is this due to the presence of a single long transcript or else due to multiple promoters producing many shorter transcripts?
3/ Is cassette expression, if present, conditional or constitutive and does this vary across the array?
4/ If multiple promoters are present within the Vibrio sp. DAT722 array, are they all the same?
It has been shown that transcription of DNA into mRNA is the most limiting step in the expression of prokaryote genes, due to the general translation of mRNA by either the presence of Shine-Dalgarno type sequences, or through leaderless translation [16, 17]. Accordingly, in order to address questions regarding the expression of cassette-associated genes, in this work we examined the gene cassette transcriptome of a Vibrio sp. DAT722.
Monoclonal stocks of Vibrio spp. DAT722 were grown overnight at 28°C on vibrio media plates (Per 400 ml; Casein peptone 4 g, NaCl 4 g., MgCl2.6H2O, 1.6 g. KCl, 0.4 g. pH 7.5.) and streaked to purity. The resultant single colonies were inoculated into 5 ml vibrio media liquid cultures and then incubated overnight, under an individual stressor in the dark with moderate shaking, before nucleic acid collection. In this work, different types of stressor likely to be experienced by this marine organism, which also incite different cellular responses (thermal and oxidative stress), were applied to examine the possibility of a stressor specific response within the array. Additionally, a single stressor (oxidative stress) was measured after different time-periods from application, to evaluate the possibility of temporal variations in the stressor response of the cassette array.
Individual treatment cultures were grown at 4°C, 14°C, or 28°C, in the dark, overnight with mild shaking. Care was taken to hold each culture at its growing temperature and in the dark until immediately before the lysis step in the nucleic acid extraction.
Individual treatment cultures were grown for 12 hours at 28°C in the dark with mild shaking and then inoculated with 3% hydrogen peroxide solution (H2O2) to give final concentrations of 0, 0.9, 1.8 and 3.6 mM H2O2 and allowed to grow for a further 30 minutes at 28°C in the dark with shaking before being harvested for nucleic acid.
To 5 ml liquid cultures of Vibrio spp. DAT722 was added 3% hydrogen peroxide solution (H2O2) to give final concentrations of 0, 0.9, 1.8 and 3.6 mM H2O2. These cultures were then incubated for 18 hours in the dark at 28°C with mild shaking before recovery of nucleic acid.
The SV RNA extraction kit (Promega, Wisconsin USA) was used exclusively. For RNA extraction, the manufacturer's methodology was used with the following exceptions: Dual one-hour on-column DNase digests were substituted and 0.75 ml of the final culture (OD 1.1, measured at 600 nm) was used as a sample. In parallel to RNA extraction, control DNA was recovered by using the same extraction methodology with the exception that the DNase digest was replaced with an RNase digest step.
PCR primers (Sigma Aldrich, Castle Hill, Australia)
DAT722 attC rev
DAT722 attC fwd
Cassette 21 ORF fwd
Method validation, Promoter isolation
Cassette 21 ORF rev
Method validation, Promoter isolation
Cassette 21 ORF fwd
Cassette 21 ORF rev
Cassette 89 ORF fwd
Cassette 89 ORF rev
Cassette 16 ORF fwd
Cassette 16 ORF rev
Cassette 17 ORF fwd
Cassette 17 ORF rev
Cassette 18 ORF fwd
Cassette 18 ORF rev
Cassette 19 ORF fwd
Cassette 19 ORF rev
Triplicate PCRs were performed on each extracted nucleic acid sample (cDNA/gDNA/RNA) as follows:
Per 50 μl reaction: 2.0 μl × sample DNA (i.e. cDNA/gDNA/RNA), 50 nM MgCl, 10 nM dNTP, 1.0 μl × 1 mg/ml Rnase, 50 pM YB3 primer, 50 pM YB4 primer, 0.3 μl. Red Hot Taq (Abgene, Surrey, UK), 5.0 μl × 10× PCR buffer (Abgene, Surrey, UK). Primers are detailed in Table 1.
80°C hot start, 94°C for 10 minutes initial denaturation followed by between 27 to 35 cycles of 94°C 30 sec, 55°C 30 sec, 72°C 1 min 30 sec. with terminal 72°C for 10 minutes.
Quantitative PCR was performed using the Roche Light Cycler® (Roche Applied Sciences, Mannheim, Germany).
Per 10 μl reaction, using 1.0 μl × sample cDNA, 40 nM MgCl, 50 pM of each primer and 1.0 μl of SYBR GreenTm master mix (Roche Applied Science, Mannheim Germany) per reaction. Primers used are detailed in table 1.
A 10-minute initial Taq activation step at 95°C followed by 35 cycles of 15 seconds at 94°C, 15 seconds at 50°C and 30 seconds at 72°C followed by a final cool down step of 10 minutes at 20°C.
Sequencing of selected gene cassettes (cassettes 21 and 89) for method validation was conducted using manufacturers recommended protocols on an ABI 377 instrument.
The fluorescent products of the attC PCR, generated with 6-Carboxyfluorescein (6FAM) labelled primers, were analysed on an ABI 377 DNA sequencer by supplying 1 ul aliquots of each PCR reaction. These were run on denaturing gels prepared according to manufacturer protocols, at approximately 2500 V for 1.5 hours. The results produced single base pair resolution to 1000 base pairs (bp) and information on the relative abundance of amplicon size classes in each sample.
Samples were run on 2% TAE gels run at 100 V for 1 hour. Gels were immersed in 20 μg/ml Ethidium Bromide solution for 15-20 minutes prior to visualisation with UV light at 260 nm.
Individual sets of size class data from high-resolution electrophoresis were initially examined for DNA contamination by the presence of amplicons in the RNA and H20 samples. Entire data sets were discarded if contamination was evident. Resulting 'clean' cDNA data were calibrated to size (bp) and then assigned to a bp size continuum in an Excel spreadsheet. The data were normalised to allow different levels of applied stressor to be compared within an experimental series. Normalisation was accomplished using the sum of all amplicon peak heights above 90 bp in size within a PCR as an inter-PCR standard. Replicates were utilised to calculate mean and standard error to establish the variability in each size class and hence significance levels for subsequent comparisons. The published sequence of the Vibrio sp. DAT722 gene cassette array  was analysed for expected PCR fragment lengths by identifying YB3/YB4 primer binding sites with each attC site and then the included sequence length was measured. PCR peaks were then correlated with expected gene cassette amplicon sizes, with PCR biases such as poly-A tailing being all corrected, and the peaks representing gene cassettes rearranged in gene cassette array order. Expression values were then calculated using the known stoichiometric relationship of cassettes in the gDNA sample as a quantitative standard, and the resulting expression data presented as ratio of cDNA/gDNA peak heights for each detectable cassette size class within a treatment (Figure 3).
Because the methodology used in this work was novel, it was validated by comparison with established methods as follows:
The identification of amplicon size classes with individual gene cassette species was confirmed by gel extraction and sequencing of individual amplicon size classes (cassettes 21 and 89). These sequences were then confirmed as being the target cassette.
Replicate studies of each stage of the methodology showed that the PCR stage invoked the majority of the variability. Consequently, triplicate PCRs were included in the methodology to assess reproducibility and set significance levels for each cassette size class. Variability in the abundance of cassette specific amplicons averaged a 12% Standard Error (%SE). Conservatively, a value of 50%SE was then adopted as the maximal variability inherent in replicate amplification of a single amplicon. Accordingly, a significant change in expression at the 95% confidence level could be defined as an observed two-fold change in expression (i.e. 1.96 × 50% SE or a 100% change in expression, and this equates to a single standard deviation, and hence a minimum 95% confidence interval).
Within-treatment quantitation was validated by relative QPCR of single RNA sample for cassettes 21 and 89. These cassettes were selected for their large difference in observed expression and also their wide spacing within the array. Results of the QPCR study were found to be comparable with that achieved in the methodology used, with a 95% confidence limit.
Between-treatment quantitation was confirmed by relative QPCR of cassette 21 expression across all treatments of a single stressor, and the results found to be comparable with that achieved in the methodology used, within a the 95% confidence limit.
Per 50 μl reaction: 2.0 μl × sample DNA, 50 nM MgCl, 10 nM dNTP, 1.0 μl × 1 mg/ml Rnase, 50 pM of each primer, 0.3 μl × Red Hot Taq (Abgene, Surrey, UK), 5.0 μl × 10× PCR buffer (Abgene, Surrey, UK). Primers are listed in Table 1.
80°C hot start, 94°C for 10 minutes initial denaturation then between 27 to 35 cycles of 94°C 30 sec, 55°C 30 sec, 72°C 1 min 30 sec. Followed by a terminal 72°C for 10 minutes
Quantitative PCR was performed using the Roche Light Cycler® instrumentation (Roche Applied Sciences, Mannheim, Germany).
Per 10 μl reactions using 1.0 μl × sample DNA, 40 nM MgCl, 50 pM of each primer and 1.0 μl of SYBR greenTm master mix (Roche Applied Science, Mannheim Germany) per reaction. PCRs using primer pairs YB4/mazGr were compared with mazGf/mazGr. Primers are detailed in table 1.
The temperature profile included a 10-minute initial Taq activation step at 95°C per the manufacturer's recommendations, followed by 35 cycles of 15 seconds at 94°C, 15 seconds at 50°C and 30 seconds at 72°C followed by a final cool down step of 10 minutes at 20°C. PCR products were run on a 2% TAE gel to verify the presence of amplicons of expected size and the absence of non-specific amplifications.
The Vibrio sp. DAT722 gene cassette array contained 116 cassettes in 90 different size classes. Of these 90 size classes, 75 represented only one type of cassette, and so these size classes uniquely defined cassettes and hence their positions within the array. The remaining 15 size classes represented multiple copies of the same cassette and examples of different cassette sequences having the same length. Consequently, these 15 size classes ambiguously defined positions within the array. The methodology used in this work identified 62 of the 90 possible size classes in genomic DNA (gDNA). Of the 62 identified size classes, 50 represented unique cassettes within the array and 12 represented cassettes from amongst the 15 ambiguously located size classes. In short, 74 of the 116 cassette positions within the array were readily accessible with the techniques used in this work.
The Vibrio sp. DAT722 array contained 91 coding and 25 non-coding (ORF-less) cassette positions. Additionally, eight of the coding cassettes were oriented so that their genes were located on the complimentary strand. In this work, both coding cassettes and non-coding cassettes were detectably expressed. However, no expression could be specifically attributed to those cassettes with genes on the complimentary strand.
The technique used provided relative quantitation of expression amongst the individual gene cassettes in the DAT722 array by using the stoichiometric relationship between cassette species in gDNA as a quantitative standard. This quantitation showed at least a 100-fold difference in the intensity of expression between the most and least detectably expressed cassettes (cassettes 95 and 11 within the array respectively). Additionally, 'blocs' of adjacent expressed cassettes were expressed at similar intensities, with less than three-fold difference amongst cassettes within the bloc (for example, cassettes 31-35 in Figure 5).
The hypothesis that the integron-associated promoter, Pc mediated expression of the entire array implies the presence of large contiguous transcripts of the array amongst cDNA species. In this work, we detected cassettes within the array whose expression was undetectable interspersed amongst detectably expressed cassettes within the array, indicating that not all of the expression seen within the array is due to Pc. The presence of additional promoters within the array was therefore implied. Additionally, the observation that different 'blocs' of similarly expressed adjacent cassettes within the array, were expressed at levels significantly different from other 'blocs' (for example, cassettes 21-23 were significantly more expressed than the bloc containing cassettes 31-35 (Figure 5)) suggested that these additional promoters had differing abilities to catalyse transcription. The presence of detectable but unexpressed cassettes between 'blocs' indicated areas of the array where some of these 'intra-array' promoters, might be located (i.e. adjacent to blocs of cassettes; 1-18, 20-25, 31-44, 52-59 and 88-95).
A number of gene cassettes, detectable in gDNA were not detectably expressed under any stressor (arrowed in red in Figure 6). These cassettes were the same as those nominated as potential promoter locations in the previous section. Of the detectably expressed cassettes, all but two cassettes (23 and 70) were conditionally expressed under at least one stressor at the two-fold level of significance. The largest measured increase in expression was 11.4-fold, seen in cassette 57 under 18 hour oxidative stress, with other cassettes (eg. cassettes 10, 20 and 104 in Figure 6) showing similar levels of increase, though not always at the same level of applied stress, or even under the same stressor. Additionally, in many cases cassettes were not detectably expressed in one or more of the experimental treatments (eg. cassette 11 under both 30 minute and 18 hour oxidative stress). Consequently, the actual increase in expression of these cassettes under these stressors from undetectable levels may have been larger than that measured.
Amongst this widespread conditional expression, the following patterns were noted:
-Gene cassettes were similarly expressed within blocs. That is, within a bloc, the level of expression was largely consistent irrespective of stressor. This observation supported the suggestion that individual promoters were associated with these 'expression blocs'.
The cassette 21 promoter appeared to bridge the attC junction between cassettes 20 and 21 (Figure 8). The putative -35 site was contained within the cassette 20 side of the attC site while the -10 site was immediately adjacent to the cassette 21 section of attC. Sequence examination of other attC junctions within the DAT722 array showed that the position and sequence of the -35 site was present in a number of other attC sites within the DAT722 array. However, the corresponding -10 site within cassette 21 (Figure 8) was not present in any of the other cassette in the DAT722 array. This indicated firstly that this potential promoter could remain functional if cassette 21 were mobilised to a location with an attC site containing the appropriately located -35 sequence. Secondly, the -10 site, being unique to cassette 21, indicated that this promoter was unique within the array. These observations indicated that the remainder of the expression seen within the DAT722 array was due to other types of intra-array promoter. The observation of detectable but unexpressed gene cassettes adjacent to expressed cassettes in other areas of the Vibrio sp. DAT 722 array, suggested that additional intra-array promoters might also be located in the vicinities of cassettes 19, 35, 60, 96, 99, 106 and 108-109.
We have found, that in the Vibrio sp. DAT722 gene cassette array, the majority of gene cassette-associated genes were expressed, that this expression was largely conditional and that the expression was facilitated by multiple, different, intra-array promoters. These findings have a significant impact on our understanding of the utility of the integron/gene cassette system in prokaryotes:
Firstly, the widespread expression of cassette-associated genes within the 116-cassette array indicated that a wide range of the phenotypes implied by cassette array was available to Vibrio sp. DAT722 host. So, rather than being restricted to only those phenotypes that may be provided by cassettes proximal to the integron, this prokaryote lineage has the potential to benefit from all cassettes present, irrespective of their location within the array. Further, because the widespread expression in DAT722 was due to cassette-borne promoters that are themselves mobile genetic elements, it is likely that promoter-containing cassettes are ubiquitous in the gene cassette metagenome. Therefore, we concluded that cassette-associated genes within all large arrays may be routinely expressed and so, cassette arrays in general are able to confer phenotypes in proportion to their size. Consequently, the presence of larger cassette arrays can provide distinct selective advantages to the host organism and this may well account for the observed prevalence of large arrays in the environment .
Secondly, the presence of cassette-borne promoters indicates that these promoters as well as cassette-borne ORFs may be rearranged within the array by the action of the IntI integrase. Consequently, with the observation of polycistronic cDNA transcripts in this work and elsewhere , repeated rounds of rearrangement may result in the assembly of a number of tandem genes of related function within a gene cassette array, in association with an appropriate cassette-borne promoter. Such 'gene cassette operons' could result in the co-ordinated expression of multiple cassette-associated genes to produce complex phenotypes . The existence of such hypothetical 'gene cassette operons' is supported by observations that differences amongst the cassette arrays of the vibrio pandemic strains were largely confined to contiguous multi-cassette indels rather than single cassette indels . Similarly, the observation that a large proportion of environmental integrons have an inactive integrase gene may also be a reflection that the existence of advantageous gene cassette operons may necessitate the preservation of not only gene cassette complement but intra-array cassette order as well . Further, where a functional integron-associated integrase gene is associated with a cassette array, it has been observed that the integrase gene may be induced by cellular stress . This induction, enabling the recruitment of novel cassettes or groups of cassettes to the array further underscores the adaptive role of cassette arrays,
We have established here a link between environmental stress and the differential expression of cassette-associated genes. It has also been established that lateral gene transfer involving gene cassettes can rapidly and randomly produce new phenotypes in prokaryote communities . However, because of the random nature of the new arrangements of cassette-borne genes and promoters produced by LGT, the resulting novel phenotypes may not necessarily be 'finely-tuned' to the stressor that causes them to be produced. Similarly, evidence for markedly decreased translation of widely spaced genes on polycistronic cassette transcripts  may indicate that the ultimate outcome of the expression of individual cassettes shown in this work may not necessarily result in an advantageous phenotype. Consequently, it remains to be demonstrated, that the conditional expression of gene cassettes, as seen in this work, produces phenotypes that appropriately address the applied stressor.
In this work, we have demonstrated that the majority of gene cassettes in large integron-associated arrays are expressed conditionally in response to environmental stressors and that this expression is facilitated by the presence of different intra-array promoters. These findings, demonstrate that large cassette arrays may produce diverse and complex phenotypes that are reactive to environmental changes, and so demonstrate an increased repertoire of the adaptive capabilities of the integron/gene cassette system.
The authors would like to thank the following colleagues for discussions:
Y. Boucher, M. Gillings, A. Holmes, and H. Stokes.
This work was facilitated by an Australian post-graduate award to C.M. through Macquarie University.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.