- Research article
- Open Access
Conserved genomic organisation of Group B Sox genes in insects.
© McKimmie et al; licensee BioMed Central Ltd. 2005
- Received: 27 January 2005
- Accepted: 19 May 2005
- Published: 19 May 2005
Sox domain containing genes are important metazoan transcriptional regulators implicated in a wide rage of developmental processes. The vertebrate B subgroup contains the Sox1, Sox2 and Sox3 genes that have early functions in neural development. Previous studies show that Drosophila Group B genes have been functionally conserved since they play essential roles in early neural specification and mutations in the Drosophila Dichaete and SoxN genes can be rescued with mammalian Sox genes. Despite their importance, the extent and organisation of the Group B family in Drosophila has not been fully characterised, an important step in using Drosophila to examine conserved aspects of Group B Sox gene function.
We have used the directed cDNA sequencing along with the output from the publicly-available genome sequencing projects to examine the structure of Group B Sox domain genes in Drosophila melanogaster, Drosophila pseudoobscura, Anopheles gambiae and Apis mellifora. All of the insect genomes contain four genes encoding Group B proteins, two of which are intronless, as is the case with vertebrate group B genes. As has been previously reported and unusually for Group B genes, two of the insect group B genes, Sox21a and Sox21b, contain introns within their DNA-binding domains. We find that the highly unusual multi-exon structure of the Sox21b gene is common to the insects. In addition, we find that three of the group B Sox genes are organised in a linked cluster in the insect genomes. By in situ hybridisation we show that the pattern of expression of each of the four group B genes during embryogenesis is conserved between D. melanogaster and D. pseudoobscura.
The DNA-binding domain sequences and genomic organisation of the group B genes have been conserved over 300 My of evolution since the last common ancestor of the Hymenoptera and the Diptera. Our analysis suggests insects have two Group B1 genes, SoxN and Dichaete, and two Group B2 genes. The genomic organisation of Dichaete and another two Group B genes in a cluster, suggests they may be under concerted regulatory control. Our analysis suggests a simple model for the evolution of group B Sox genes in insects that differs from the proposed evolution of vertebrate Group B genes.
- Drosophila Species
- Sock Gene
- Vertebrate Group
- Insect Genome
- Sock Protein
The family of Sox-domain containing proteins encompass a group of metazoan transcriptional regulators first identified by their similarity with the mammalian testis-determining factor SRY. Membership of the Sox family is conferred by the presence of an HMG1-type DNA-binding domain sharing greater than 60% amino-acid sequence identity to that of SRY . Mammalian genome sequencing projects indicate that in humans and mice there are twenty Sox genes , divided into eight subgroups (A-H) on the basis of sequence identity within and outwith the HMG-domain. Aside from mammals, Sox genes have been identified in all metazoans examined to date, including birds, fish amphibians, basal chordates, insects and nematodes .
The B subgroup is of particular interest since members of this group are most closely related to SRY and appear to be functionally conserved during evolution. Sequence analysis and functional studies suggest that, in vertebrates, the five members of the B subgroup can be subdivided into two further groups; B1; Sox1, Sox2 and Sox3;  and B2; Sox14 and Sox21; . It has been suggested from studies in the chick that the three group B1 proteins act as gene activators whereas the B2 proteins act as gene repressors . In terms of genomic organization, all five of the group B genes are devoid of introns. Sox3 is located on the mammalian X chromosome and is believed to be the ancestor of Sry [7, 8]. In humans, the remaining four autosomal group B genes are arranged in two pairs, each comprising one B1 gene and one B2 gene: Sox2 and Sox14 map together on chromosome 3 [9, 10] and Sox1 and Sox21 map together on chromosome 13 [5, 11]. This organization is conserved, at least in part, in other vertebrates with Sox2-Sox14 mapping together in the chick and the monotreme, O. anatinus, and Sox1-Sox21 mapping together in the chick [12, 13]. There is, however, no linkage of Group B Sox genes in the mouse genome [14, 15]. A model suggesting the evolution of group B genes and Sry from a single ancestor has been proposed, which suggests that pairs of B1 and B2 genes arose by a tandem duplication and then a chromosomal duplication .
The fruitfly, Drosophila melanogaster, has proved to be a tractable system for studying conserved aspects of eukaryotic gene function and, with the production of other insect genome sequences, a useful baseline for evolutionary studies of gene organisation . Whole-genome sequence is now available for three insects, Drosophila melanogaster, Drosophila pseudoobscura (which diverged from melanogaster some 46 million years ago) and Anopheles gambiae, which diverged from melanogaster approximately 250 million years ago [17, 18]. Sequencing and assembly of a further ten Drosophila species is currently underway  promising an unparalleled data source for evolutionary studies. In addition to the diptera, the sequencing of the Hymenoptera, Apis mellifera (honey bee ~280 million years from Drosophila), is now well underway, allowing fragments of a fourth insect genome to be assessed. In functional terms, Drosophila is a useful model for studying SOX gene function due to its genetic tractability. For example, we have previously shown that, in the case of the Drosophila group B gene Dichaete, there is functional conservation between insect and mammalian genes . In addition, we, and others, have demonstrated a degree of in vivo functional redundancy between Dichaete and SoxN [21, 22] as had been proposed for the mammalian group B genes . Of particular interest is the fact that the expression patterns and functional studies of group B genes suggest that they participate in the earliest events of CNS differentiation in all organisms that have been studied to date including Drosophila, Xenopus, chick, mouse, ascidians and hemichordates .
To further explore the relationship between group B Sox genes we examined the extent and organization of the family in insects. Our studies show that group B Sox gene organisation is similar in four different insects. We find conservation in the sequence and genome organization of the group B genes in D. melanogaster, D. pseudoobscura, A. gambiae and A. melifora. In contrast to mammals and in agreement with a previous report , we find that two group B2 genes contain introns and are organized as a single genomic cluster along with the intronless Dichaete gene. Our studies indicate a potentially different evolutionary path for members of the group B family in insects and vertebrates.
To explore the structure of the group B Sox genes in insects we first accurately determined the extent and structure of the family in Drosophila melanogaster. The group B genes, Dichaete and SoxNeuro (SoxN) have already been well described in the literature [26–28]. Two other group B gene fragments have been identified , Sox21a and Sox21b, but their structure and genomic organisation have not been reported. Using a combination of database searching and DNA sequencing we characterised both of these genes in detail. We find no evidence for any other group B genes in Release 3.2 or Release 4 of the Drosophila genome sequence, indicating that there are a total of four in the D. melanogaster genome.
Blast searches of the Drosophila genome identified a group B HMG-domain interrupted by a 1655-bp intron in the 70D region of chromosome arm 3L. Using primers designed against each of these predicted exons we amplified a fragment of 1238 bp from the LD cDNA library produced by the BDGP . For reasons that are as yet unclear, we have been unable to recover a clone from this or any of the several other cDNA libraries that we have screened. The fragment amplified from the library was sequenced in its entirety and found to contain a long open reading frame encoding a 389 amino acid Sox domain protein. The predicted polypeptide initiates with a methionine and probably contains the entire coding sequence for the gene. When aligned with the genome sequence we predict a gene with two exons spanning 2.8 kb. Blast searches with the predicted protein find over 90% identity with a range of group B Sox proteins in the HMG DNA-binding domain. The best scores are with the DNA-binding domains of the vertebrate Sox21 and Sox14 proteins; however, there is little significant similarity outside of the DNA-binding domain. The Sox21a gene has previously been reported as SoxB2-3 (CG7345) and it has been suggested that it may represent a pseudogene . As we show below, RT-PCR and in situ hybridisation studies indicate that Sox21a is expressed in both D. melanogaster and D. pseudoobscura indicating that it is not a pseudogene.
representation of Sox expression in during Drosophila development assayed by RT-PCR.
Early 3rd instar
Late 3rd instar
12 h pupa
36 h pupa
Group B genes in other insects
Our findings show that Drosophila melanogaster has four group B Sox genes compared to the five found in vertebrates and, unlike vertebrates, two of the genes contain introns. To investigate whether this particular organization is unique to D. melanogaster we searched the available genome sequence of other insects to find potential Sox domain genes. Using the Dichaete DNA-binding domain as a query, we searched the Drosophila pseudoobscura, Anopheles gambiae and Apis mellifera genome and EST sequence databases using Blast-P and Blast-N (see materials and methods for EST and genome scaffold accessions). In all three cases we found evidence for four Group B genes and were able to build gene models, from the genome sequence alone or with the addition of EST data where available. The initial characterization of the insect group B genes, based on the HMG-domain sequence, suggests that there is a single orthologue of each Drosophila gene in the other three species.
The alignment presented in Figure 1a. shows the similarity between the insect SoxN proteins and mouse Sox1. As previously reported, conservation between vertebrate and invertebrate Sox proteins is mostly restricted to the DNA-binding domains [26, 27]. Between the insect proteins there are more extensive regions of homology outwith the DNA-binding domain. The Drosophila SoxN sequences show over 90% sequence identity over their entire length and, as expected from the phylogenies based on rDNA and protein coding sequences, the other insect sequences are more diverged [18, 30]. A. gambiae is overall 64% identical to the melanogaster sequence with particularly well conserved regions in the N-terminal 50 amino acids and more patchy conservation C-terminal to the DNA-binding domain. A. mellifera is further diverged (52% identity with Drosophila). Conserved regions outside the DNA-binding domains among all four sequences are restricted to a stretch of amino acids C-terminal that may represent conserved functional motifs important in transcriptional regulation.
The situation with Dichaete is similar to that observed with SoxN, and the figures for amino acid identity are virtually identical (Figure 1b). Outside of the DNA-binding domain the Dichaete sequences show even less similarity comparing the Drosophila species and the other two insects; conservation between all four being restricted to limited regions C-terminal to the DNA-binding domain. Interestingly, we have shown that the C-terminal region of D. melanogaster Dichaete contains sequences required for activity in a context-specific manner  and C-terminal regions of the mouse and chicken Sox2 protein are believed to be involved in aspects of correct Sox2 function .
This gene is the least conserved between the four species and outside of the DNA-binding domain they show little similarity with vertebrate group B2 proteins (Figure 1c). There is extensive homology between the two Drosophila species, however, the Anopheles and Apis sequences are very diverged outside of the DNA binding domain. As with D. melanogaster, there are no EST sequences available that support the structure of Sox21a in the other insects.
The predicted Drosophila Sox21b proteins are again very similar, over 88% identical over their length. The other insect sequences are less well conserved, although the Anopheles sequence has a block of conservation C-terminal to the DNA-binding domain, including a Glutamic acid-rich domain (Figure 1d). The predicted Apis sequence is less well conserved, we note, however, that all four proteins are identical at the extreme C-terminus. With both the Anopheles and Apis proteins we cannot confidently predict the N-terminal exons and are unable to find any regions with amino acid similarity to the first 2 coding exons of the Drosophila sequences in the Anopheles or Apis genomic sequence between the end of Dichaete and the Sox21b Sox-domain encoding exons. Our current models are, however, supported by the available EST sequences for both species although the EST sequences are not full-length. Therefore, the definitive structure of these two insect Sox21b genes will require further investigation. Nevertheless, it is clear from the available sequence that orthologues of Sox21b are present in other insects.
To confirm the identification of four group B genes in both D. melanogaster and D. pseudoobscura, we performed whole-mount in situ hybridization to embryos of both species using exon-specific probes generated by PCR from genomic DNA. In all four cases we find very similar patterns of expression during embryogenesis. In the case of Dichaete, we find blastoderm expression including a broad central domain and a region of expression in the cephalic neuroectoderm (Figure 2A and 2A'). After gastrulation there is extensive expression in the developing CNS (Figure 2B and 2B') including the midline (not shown). With SoxN we find conserved blastoderm expression, including an identical restriction from the ventral region of the embryo, followed by extensive expression throughout the developing CNS (Figure 2C to 2D'). With Sox21a, we identified conserved expression in the anlage of the foregut and hindgut at stage 12 (Figure 2E and 2E') with later expression in specific cells of the midline after stage 14 (Figure 2F and 2F'). Sox21b shows conserved expression in abdominal epidermal stripes from stage 13 (Figure 2G to 2H'). These observations indicate that all four group B genes have conserved expression patterns during embryogenesis.
Genomic organisation of group B genes in Drosophila: the Dichaete complex
In some vertebrates the two classes of group B genes, B1 and B2, are linked on the same chromosome. In contrast, with Drosophila a single gene, SoxN, maps to the second chromosome and the remaining three all map to chromosome 3. We examined the organisation of the group B genes in the other insect genomes and found that the situation was very similar to that observed in Drosophila. In melanogaster, SoxN is intronless and sits alone in the middle of an 80 Kb island with no flanking genes for 35 Kb proximal and 45 Kb distal, an unusual organisation for a Drosophila gene. We have previously shown that Dichaete is controlled by extensive 3' regulatory sequences, suggesting that perhaps the paucity of genes flanking SoxN may also indicate the presence of extensive regulatory sequences. In support of this, we find several clusters of predicted transcription factor binding sites from 35 kb upstream to 20 kb downstream of SoxN when we use a stringent search criteria with Cis Analyst analysis software [32, 33] (data not shown). Similar searches with Dichaete find previously identified regulatory sequences, suggesting that SoxN may indeed be subject to complex regulation. Comparative analysis of the melanogaster and pseudoobscura genomes with the Vista genome alignment viewer [34, 35] indicates that the genomic organization is very similar in the two species. The Ensemble annotation of the Anophelese genome indicates that the region around SoxN is also sparsely populated, with only 2 short stretches of EST homology in the 150 kb flanking SoxN. Therefore, it is possible that SoxN is subject to complex regulatory control in Anopheles. There is currently insufficient contiguous genomic sequence from Apis to assess the organization of the SoxN region.
The organization of the Dichaete region in the Anopheles genome is very similar to that in the Drosophila species with three genes found in a 190 kb region of chromosome arm 3L. Dichaete is intronless and Sox21b is located approximately 110 kb downstream of this. There are no other predicted genes in the region. The Sox21b has a similar structure to those of the Drosophilids, however, it is not identical. We have been unable to find a 5' non-coding exon and, as we note above, the second intron found in the DNA-binding domains of the Drosophila Sox21b genes is absent in Anopheles with exons 4 and 5 fused. The other introns are, however, conserved in position (figure 1d). With the Anopheles Sox21a gene, the single intron position is conserved with the Drosophila species, however, the intron is considerably larger and contains an insertion of a Q-class retrotransposon in the sequenced strain . We find no evidence for an Fbp1 orthologue in the vicinity, the nearest similar sequence being some 5 Mb away on the same chromosome arm.
The available sequence in the region is more fragmentary in the case of Apis. Here we find an intronless Dichaete gene and can define two sets of exons corresponding to the split DNA binding domains of Sox21a and Sox21b. Overall, the organization is similar to the other insects; like Anopheles, the intergenic region between Dichaete and Sox21b is large (~90 kb), however, unlike the other insects the distance between Sox21b and Sox21a is also large (~80 kb). In the case of Sox21b we have used EST sequence to support the gene model we have derived. The EST confirms the first four exons and we predict the terminal exon on the basis of homology with the other species, particularly the terminal 30 amino acids. As with Anopheles, the Apis Sox21b gene has a single DNA-binding domain intron in the same position of the first Drosophila DNA-binding domain intron. The intron immediately downstream of the DNA-binding domain is also conserved in all four insects, however, the remaining two intron positions differ between Apis and the other insects. Although the Apis assembly is preliminary in this region, with several gaps still present in the sequence, the fact that the gene models are very similar to the other insects and that Dichaete and Sox21b predictions are supported by EST data suggests that the gene models we propose are likely to be accurate for the majority of the coding sequence.
We compared the Dichaete to Sox21b intergenic regions of Anopheles and Apis to the melanogaster sequence with the OWEN alignment tool and failed to detect any significant stretches of similarity, even at relatively low stringency. This suggests that if there is conservation in gene regulatory sequences between these diverged insects it may be difficult to detect or have undergone extensive rearrangement.
Evolutionary perspective on insect group B genes
Taken together, the analysis presented here shows that the genomic organization and sequence of group B Sox genes have been conserved during insect evolution. Particularly striking is the clustering of three genes in a small region of the genome. The structure of these genes and their relationship with vertebrate Group B genes suggest that SoxN and Sox21a are homologous to vertebrate group B1 and B2 genes respectively, whereas Dichaete and Sox21b may represent insect-specific group B genes.
The sequence alignments of the HMG DNA-binding domains from insect and mammalian group B Sox proteins suggests that the insect proteins may be separated into three distinct groups. The first, containing SoxN, aligns with the vertebrate Sox1, 2 and 3 proteins and most likely represents an orthologue of the vertebrate group B1 class. This conclusion, based on sequence, is supported by the functional analysis of group B1 proteins in vertebrates and Drosophila. In both cases, group B1 genes are expressed from the earliest stages of CNS development and are implicated in regulating early neural specification [21, 22, 38, 39]. In addition, we have evidence that mammalian Sox1 genes can rescue SoxN phenotypes in the Drosophila CNS, supporting the view that these proteins are functionally conserved (P. Overton and S.R. unpublished observations). The group B sequences isolated from the basal chordates, acorn worm and sea squirt, have also been shown to be expressed early in the specification of the CNS [40, 41]. Thus, it appears that all metazoans studied to date have at least one group B gene with expression marking neural lineages early in development. Further studies of primitive invertebrates will determine whether group B Sox expression is a universal marker for CNS development.
In a previously published phylogenetic studies it was suggested that Dichaete be classified as a Group B2 protein . However, while the analysis clearly differentiates between the group B proteins and other fly Sox proteins it could not unambiguously resolve the relationship between each of the group B proteins. In terms of function and expression, the Dichaete gene behaves very much like a group B1 gene, it is expressed early during CNS development and is required for neural differentiation [20, 42]. We have previously shown that the mouse Sox2 gene efficiently rescues Dichaete phenotypes, further supporting a functionally similarity between Dichaete and vertebrate group B1 genes [20, 42]. In contrast to the conclusion based on functional studies, the sequence analysis suggests that insect Dichaete DNA-binding domain sequences are markedly different from other group B1 proteins and are more similar to group B2 proteins. The conservation of the insect sequences indicates that a Dichaete- like sequence was present at least 300 My years ago, when Apis and the Diptera last shared a common ancestor . We believe that the functional evidence is more convincing than the arguments based on sequence alignments and therefore suggest that Dichaete represents a group B1 function that has diverged from the canonical group B1 sequence, presumably due to selection for insect-specific functions. For example, Dichaete is required for early segmentation in the Drosophila embryo, a highly derived function, and it may be that sequence changes in the HMG-domain have been selected for such a function while still allowing a role in CNS-specification. As with Drosophila, both Anopheles and Apis are long germ insects that share some aspects of early development such as the early appearance of striped domains of even skipped expression [43, 44]. Thus it is possible that insect Dichaete genes have a common role in early patterning events. It will be of considerable interest to examine the complement of group B Sox genes in Coleoptera, Homoptera or Orthoptera to see if the HMG domain sequence and gene organisation is the same as the insects so far sequenced. To investigate this we used the Dichaete DNA-binding domain to search the available sequence of the silk moth Bombyx mori.  and found a single Group B gene that was clearly an orthologue of the Dichaete genes discussed here, containing the diagnostic Leucine and Isoleucine residues described here.
As with vertebrate group B1 genes, SoxN and Dichaete are expressed in broadly overlapping domains and act partially redundantly in CNS specification [21, 22]. The close similarity between the expression and function of SoxN and Dichaete in the CNS raises the possibility that they arose from a common ancestor by a duplication event and may thus share some common regulatory sequences. However, when we compared the sequences 5' or 3' to SoxN with the Dichaete 3' sequence we could not detect any sequence similarity indicating that any conservation in regulatory sequences is not visible at a large scale; this is not entirely surprising since we cannot detect any sequence similarity between the Dichaete regulatory sequences from Drosophila and Anopheles, while our analysis indicates the divergence of SoxN and Dichaete predates the Drosophila-Anopheles divergence.
Based on the sequence alignment of insect Sox21a DNA-binding domains with those of vertebrate Sox14 proteins, it is possible that Sox21a may be an orthologue of the group B2 class. It has been suggested that in chicken Sox14 and Sox21 act as antagonists of group B1 function in a subset of the developing CNS . The function of Sox21a in Drosophila is not known at present, however, Sox21a is expressed late in the development of the embryonic CNS midline, a site of SoxN and Dichaete expression, indicating there is the potential for the type of antagonistic interaction proposed for vertebrates. The Sox21b DNA-binding domain sequence indicates that it is closely related to Dichaete. Both these proteins have a set of unique residues in their DNA-binding domains that are not found in any other group B proteins identified to date. The Sox21b gene is conserved between the insects and its close similarity to Dichaete suggests that both genes arose from a common origin in the ancestor of the arthropods after their divergence from the nematodes since there is no close sequence in C. elegans or its relatives. In terms of expression, Sox21b is expressed in the large hindgut along with Dichaete, supporting the possibility that it may also antagonise the activity of Dichaete. In this respect then Sox21b may represent a group B2 function. It is therefore possible that insects contain 2 group B1 class activities, involved in early CNS development, and two B2 class genes. Again we emphasise that the functional assignment of the insect genes may contrast with the data derived from sequence analysis, which predicts a single group B1 gene and three group B2 genes. We suggest that the separation of group B Sox domains into a B1 class and B2 class based solely on sequence does not reflect meaningful functional differences in insects. We have initiated a functional analysis of Sox21a and Sox21b in the hope that we can clarify this issue.
The genome organisation of the Dichaete cluster is unusual, not only are three genes clustered together in the genome but two of them, Sox21a and Sox21b, have introns within the HMG-domain. The single Sox21a intron is conserved in all four of the insect genes suggesting that it is ancestral to the insects. Sox21b is more complex, there are six introns in melanogaster and pseudoobscura, four of these are conserved in Anopheles and two are conserved in Apis. In the Drosophila species, there are two introns in the DNA-binding domain, the first of which is present in all four insects. The second intron, in an identical location to the Sox21a intron, is only found in the two Drosophila species. A simple model of a single intron loss is therefore unlikely to account for this since both Apis and Anopheles do not have the intron. It is possible that Apis and Anophelese lost the intron independently or, alternatively, that the common ancestor of the Drosophila species gained the intron, perhaps via a gene conversion event with Sox21a. Interestingly, the two group B genes from C. elegans also contain introns in the DNA-binding domain, in identical positions in both genes, but they are in different positions to the Sox21a and Sox21b introns. This suggests that the common ancestor of insects and nematodes did not contain DNA-binding domain introns and that these have been acquired independently in both lineages.
The conservation of genome structure with the insect Dichaete cluster suggests that there may be functional constraints on the organisation. We suggest that this is likely to be a reflection of shared regulatory sequence since the region between Dichaete and Sox21b in melanogaster contains extensive regulatory sequences essential for correct Dichaete expression. We note that both Sox21a and Sox21b have expression domains that overlap with Dichaete, in the midline for Sox21a and the hindgut with Sox21b. These expression domains may therefore be controlled by common regulatory sequences and the need to maintain coordinated regulation of the three genes has maintained the integrity of the cluster in the insects. The conservation in expression between D. melanogaster and D. pseudoobscura is consistent with this view; it will be of interests to examine the expression of the all of the Sox genes in Anopheles to further explore this hypothesis.
The following sources were used to obtain genome sequence: D. melanogaster (Release 3.2, [47, 48]) from FlyBase  and the following scaffolds were used; AE003535 for the Dichaete region and AE003622 for the SoxN region. D. pseudoobscura (Freeze_1 assembly) was obtained from the Human Genome Sequencing Center, Baylor College of Medicine (HGSC-BCM ) and the following scaffolds used; Contig5946_Contig6670 for the Dichaete region and Contig1741_Contig5707 for the SoxN region. Anopheles gambiae genome sequence release 19.2a.1, compiled by the International Anopheles Genome Project , was obtained from the Ensembl server at the Wellcome Trust Sanger Institute . In the Ensemble annotation the Sox genes have the following accessions: SoxN (ENSANGG00000019842), Dichaete (ENSANGG00000010137), Sox21a (ENSANGG00000010002) and Sox21b (ENSANGG00000009947). Anopheles EST sequences representing Dichaete (TC44994) and Sox21b (TC45155) were obtained from The Institute for Genome Research . Apis mellifera Genome assembly Amel_1.1 was obtained from HGSC-BCM  and the following scaffolds used: for the Dichaete region, Group8.12 (Dichaete and Sox21b) was found to overlap by 4.5 kb with GroupUn.570 containing Sox21a and the sequences were combined into a single contig. SoxN was contained within Group17.6. In addition a search of the Honey Bee Brain EST project [55, 56] uncovered two EST sequences corresponding to Dichaete (BB170009A10D01) and Sox21b (BB170011B20A11). These were used to verify the exon predictions from the genome sequence. Vertebrate group B sequences were obtained from Uniprot. Nematode sequences were recovered by Blast searches of the EST collections at Nematode.net (Genome Sequencing Center, Washington University, St Louis, ).
Homology searching was performed using the Blast algorithm  at Sanger, HGSC-BCM and Berkeley Drosophila Genome Project  web sites. Genomic sequences were imported into Artemis v5 [60, 61] and annotated manually using the Blast output as a guide. Multiple Sequence alignments were performed locally using ClustalXv1.8  and graphically represented with BoxShade . The alignment of intergenic regions was performed with OWEN .
A cDNA clone for Sox21b (GH07353) was obtained from the Drosophila gene collection  and sequenced on both strands using an ABI prism kit in the Genetics Department sequencing core. PCR and RT-PCR amplifications were carried out using minor modifications to standard techniques  using the following primer combinations:
Melanogaster primers for RT-PCR
Dichaete F ACAATCCATTCCATCAACTACC
Dichaete R TTGGTGTTCCCTCCTTACTC
Sox21B F AGTCTCATGAACAGCGGAAG
Sox21B R GGAGTTGCTCAGATACGACG
SoxN F CAGCAGCAACAGCAACACTAC
SoxN R TTTCATCGCCTCGCCACAAC
Pseudoobscura primers for in situ probes:
Dp-Dichaete F CGAACTACGGATTCCACCT
Dp-Dichaete R CATTCCGTTGGCCTGCAT
Dp-SoxN F AGCTGAGTCACCATAACCAC
Dp-soxN R GTCATGTGATGGCTACCAA
Dp-Sox21A Exon1 F GAGCATCTCGACGCTACTAC
Dp-Sox21A Exon 1 R GGAATTGGAGTGGCTATGAT
Dp-Sox21A Exon 2 F CTAAGGACATGCAGTCACAG
Dp-Sox21A Exon 2 R GACTTCACGCAGCCGTAGGAT
Dp-Sox21B F CGTCTATCCACACACCTGTC
Dp-Sox21B R GACGATGTCTGCTGCTGTT
Whole-mount in situ hybridisation to Drosophila embryos was performed using minor modifications to a standard protocol .
All genetic nomenclature is according to FlyBase .
This work was supported by an UK-Medical Research Council programme grant to S.R., M. Ashburner and D. Gubb. We are grateful to M. Ashburner for comments on the manuscript and to S. Marcellini and J. Roote for assistance with the D. pseudoobscura husbandry.
- Sinclair AH, Berta P, Palmer MS, Hawkins JR, Griffiths BL, Smith MJ, Foster JW, Frischauf AM, Lovell-Badge R, Goodfellow PN: A gene from the human sex determining region encodes a protein with homology to a conserved DNA binding motif. Nature. 1990, 346: 240-244. 10.1038/346240a0.View ArticlePubMedGoogle Scholar
- Schepers GE, Teasdale RD, Koopman P: Twenty pairs of Sox: extent, homology, and nomenclature of the mouse and human Sox transcription factor gene families. Mol. Cell. 2002, 3: 167-170. 10.1016/S1534-5807(02)00223-X.Google Scholar
- Bowles J, Schepers G, Koopman P: Phylogeny of the SOX family of developmental transcription factors based on sequence and structural indicators. Dev Biol. 2000, 227: 239-255. 10.1006/dbio.2000.9883.View ArticlePubMedGoogle Scholar
- Collignon J, Sockanathan S, Hacker A, Cohen-Tannoudji M, Norris D, Rastan S, Stevanovic M, Goodfellow PN, Lovell-Badge R: A comparison of the properties of SOX3 with SRY and two related genes SOX1 and SOX2. Development. 1996, 122: 509-520.PubMedGoogle Scholar
- Malas S, Duthies S, Deloukas P, Episkopou V: The isolation and high-resolution mapping of human SOX14 and SOX21 ; two members of the SOX gene family related to SOX1, SOX2 and SOX3. Mamm Genome. 1999, 10: 934-937. 10.1007/s003359901118.View ArticlePubMedGoogle Scholar
- Uchikawa M, Kamachi Y, Kondoh H: Two distinct group B SOX genes for transcriptional activators and repressors: their expression during embryonic organogenesis of the chicken. Mech Dev. 1999, 84: 103-120. 10.1016/S0925-4773(99)00083-0.View ArticlePubMedGoogle Scholar
- Stevanovic M, Lovell-Badge R, Collignon J, Goodfellow PN: SOX3 is an X-linked gene related to SRY. Hum Mol Genet. 1993, 2: 2013-2018.View ArticlePubMedGoogle Scholar
- Foster JW, Graves JA: An SRY-related sequence on the marsupial X chromosome: implications for the evolution of the mammalian testis-determining gene. Proc Nat Acad Sci USA. 1994, 91: 1927-1931.PubMed CentralView ArticlePubMedGoogle Scholar
- Arsic N, Rajic T, Stanojcic S, Goodfellow PN, Stevanovic. M: Characterisation and mapping of the human SOX14 gene. Cytogenet Cell Genet. 1998, 83: 139-146. 10.1159/000015149.View ArticlePubMedGoogle Scholar
- Stevanovic M, Zuffardi O, Collignon J, Lovell-Badge R, Goodfellow PN: The cDNA sequence and chromosomal location of the human SOX2 gene. Mamm Genome. 1994, 5: 640-642. 10.1007/BF00411460.View ArticlePubMedGoogle Scholar
- Malas S, Duthie SM, Mohri F, Lovell-Badge R, Episkopou V: Cloning and mapping of human SOX1: a highly conserved gene expressed in the developing brain. Mamm Genome. 1997, 8: 866-868. 10.1007/s003359900597.View ArticlePubMedGoogle Scholar
- Kuroiwa A, Uchikawa M, Kamachi Y, Kondoh H, Nishida-Umehara C, Masabanda J, Griffin DK, Matsuda Y: Chromosome assignment of eight SOX family genes in chicken. Cytogenet. Genome Res. 2002, 98: 189-193.View ArticleGoogle Scholar
- Kirby PJ, Waters PD, Delbridge M, Svartman M, Stewart AN, Nagai K, Graves JAM: Cloning and mapping of platypus SOX2 and SOX14: Insights into SOX group B evolution. Cytogenetic and Genome Research. 2002, 98: 96-100. 10.1159/000068539.View ArticlePubMedGoogle Scholar
- Mouse Genome Informatics: http://www.informatics.jax.org/.Google Scholar
- Blake JA, Richardson JE, Bult CJ, Kadin JA, Eppig JT: MGD: The mouse genome database. Nucleic Acids Res. 2003, 31: 193-195. 10.1093/nar/gkg047.PubMed CentralView ArticlePubMedGoogle Scholar
- Bergman CM, Pfeiffer BD, Rincon-Limas DE, Hoskins RA, Gnirke A, Mungal CJ, Wang AM, Kronmiller B, Pacleb J, Park S, Stapleton M, Wan K, George R, Jong PJ, Botas J, Rubin GM, Celniker SE: Assessing the impact of comparative genomic sequence data on the functional annotation of the Drosophila genome. Genome Biology. 2002, 3: RESEARCH0086.-10.1186/gb-2002-3-12-research0086.PubMed CentralView ArticlePubMedGoogle Scholar
- Powell JR: Progress and Prospects in Evolutionary Biology: The Drosophila Model. 1997, Oxford, Oxford University Press.Google Scholar
- Gaunt MW, Miles MA: An insect molecular clock dates the origin of the insects and accords with palaeontological and biogeographic landmarks. Mol Biol Evol. 2002, 19: 748-761.View ArticlePubMedGoogle Scholar
- Annotation of 12 Drosophila Genomes: http://rana.lbl.gov/drosophila/multipleflies.html.Google Scholar
- Sanchez-Soriano N, Russell S: The Drosophila Sox-domain protein Dichaete is required for the development of the central nervous system midline. Development. 1998, 125: 3989-3996.Google Scholar
- Buescher M, Hing FS, Chia W: Formation of neuroblasts in the embryonic central nervous system of Drosophila melanogaster is controlled by SoxNeuro. Development. 2002, 129: 4193-4203.PubMedGoogle Scholar
- Overton PM, Meadows LA, Urban J, Russell S: Evidence for differential and redundant function of the Sox genes Dichaete and SoxN during CNS development in Drosophila. Development. 2002, 129: 4219-4228.PubMedGoogle Scholar
- Nishiguchi S, Wood H, Kondoh H, Lovell-Badge R, Episkopou V: Sox1 directly regulates the g-crystallin genes and is essential for lens development in mice. Genes & Development. 1998, 12: 776-781.View ArticleGoogle Scholar
- Sasai Y: Roles of Sox factors in neural determination: conserved signaling in evolution?. Int J Dev Biol. 2001, 45: 321-326.PubMedGoogle Scholar
- Cremazy F, Berta P, Girard F: Genome-wide analysis of Sox genes in Drosophila melanogaster. Mech Dev. 2001, 109: 371-375. 10.1016/S0925-4773(01)00529-9.View ArticlePubMedGoogle Scholar
- Russell SRH, Sanchez-Soriano N, Wright CR, Ashburner M: The Dichaete gene of Drosophila melanogaster encodes a Sox-domain protein required for embryonic segmentation. Development. 1996, 122: 3669-3676.PubMedGoogle Scholar
- Cremazy F, Berta P, Girard F: SoxNeuro, a new Drosophila Sox gene expressed in the developing central nervous system. Mech Dev. 2000, 93: 215-219. 10.1016/S0925-4773(00)00268-9.View ArticlePubMedGoogle Scholar
- Nambu PA, Nambu JR: The Drosophila fish-hook gene encodes an HMG domain protein essential for segmentation and CNS development. Development. 1996, 122:Google Scholar
- Stapleton M, Liao G, Brokstein P, Hong L, Carninci P, Shiraki T, Hayashizaki Y, Champe M, Pacleb J, Wan K, Yu C, Carlson J, George R, Celniker SE, Rubin GM: The Drosophila gene collection: identification of putative full-length cDNAs for 70% of D. melanogaster genes. Genome Res. 2002, 12: 1294-1300. 10.1101/gr.269102.PubMed CentralView ArticlePubMedGoogle Scholar
- Friedrich M, Tautz D: Evolution and phylogeny of the Diptera: a molecular phylogenetic analysis using 28S rDNA sequences. Syst Biol. 1997, 46: 674-698.View ArticlePubMedGoogle Scholar
- Yuan H, Corbi N, Basilico C, Dailey L: Developmental-specific activity of the FGF-4 enhancer requires the synergistic action of Sox2 and Oct-3. Genes & Dev. 1995, 9: 2635-2645.View ArticleGoogle Scholar
- Cis-analyst: http://rana.lbl.gov/cis-analyst/.Google Scholar
- Berman BP, Nibu Y, Pfeiffer BD, Tomancak P, Celniker SE, Levine M, Rubin GM, Eisen MB: Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc Natl Acad Sci U S A. 2002, 99: 757-762. 10.1073/pnas.231608898.PubMed CentralView ArticlePubMedGoogle Scholar
- Vista Genome Browser: http://pipeline.lbl.gov/pseudo.Google Scholar
- Couronne O, Poliakov A, Bray N, Ishkhanov T, Ryaboy D, Rubin E, Pachter L, Dubchak I: Strategies and Tools for Whole-Genome Alignments. Genome Res. 2003, 13: 73-80. 10.1101/gr.762503.PubMed CentralView ArticlePubMedGoogle Scholar
- Sanchez-Soriano N, Russell S: Regulatory mutations of the Drosophila Sox gene Dichaete reveal new functions in embryonic brain and hindgut development. Dev Biol. 2000, 129: 1165-1174.Google Scholar
- Besansky NJ, Bedell JA, Mukabayire O: Q: a new retrotransposon from the mosquito Anopheles gambiae. Insect Mol Biol. 1994, 3: 49-56.View ArticlePubMedGoogle Scholar
- Avilion AA, Nicolis SK, Pevny LH, Perez L, Vivian N, Lovell-Badge R: Multipotent cell lineages in early mouse development depend on SOX2 function. Genes & Development. 2003, 17: 126-140. 10.1101/gad.224503.View ArticleGoogle Scholar
- Graham V, Khudyakov J, Ellis P, Pevny L: SOX2 functions to maintain neural progenitor identity. Neuron. 2003, 39: 749-765. 10.1016/S0896-6273(03)00497-5.View ArticlePubMedGoogle Scholar
- Taguchi S, Tagawa K, Humphreys T, Satoh N: Group B Sox genes that contribute to specification of the vertebrate brain are expressed in the apical organ and ciliary bands of hemichordate larvae. Zool Sci. 2002, 19: 57-66. 10.2108/zsj.19.57.View ArticlePubMedGoogle Scholar
- Miya T, Nishida H: Expression pattern and transcriptional control of SoxB1 in embryos of the ascidian Halocynthia roretzi. Zool Sci. 2003, 20: 59-67. 10.2108/zsj.20.59.View ArticlePubMedGoogle Scholar
- Zhao G, Skeath JB: The Sox-domain containing gene Dichaete/fish-hook acts in concert with vnd and ind to regulate cell fate in the Drosophila neuroectoderm. Development. 2002, 129: 1165-1174.PubMedGoogle Scholar
- Davis GK, Patel NH: Short, long and beyond: Molecular and embryological approaches to insect segmentation. Ann Rev Entomol. 2002, 47: 669-699. 10.1146/annurev.ento.47.091201.145251.View ArticleGoogle Scholar
- Goltsev Y, Hsiong W, Lanzaro G, Levine M: Different combinations of gap repressors for common stripes in Anopheles and Drosophila embryos. Dev Biol. 2004, 275:Google Scholar
- Xia Q, Zhou Z, Lu C, Cheng D, Dai F, Li B, Zhao P, Zha X, Cheng T, Chai C, Pan G, Xu J, Liu C, Lin Y, Qian J, Hou Y, Wu Z, Li G, Pan M, Li C, Shen Y, Lan X, Yuan L, Li T, Xu H, Yang G, Wan Y, Zhu Y, M MY, Shen W, Wu D, Xiang Z, Yu J, Wang J, Li R, Shi J, Li H, Li G, Su J, Wang X, Li G, Zhang Z, Wu QW, Li J, Zhang Q, Wei N, Xu J, Sun H, Dong L, Liu D, Zhao S, Zhao X, Meng Q, Lan F, Huang X, Li Y, 0Fang L, Li C, D DL, Sun Y, Zhang Z, Yang Z, Huang Y, Xi Y, Qi Q, He D, Huang H, Zhang X, Wang Z, Li W, Cao Y, Yu Y, Yu H, Li J, Ye J, Chen H, Zhou Y, Liu B, Wang J, Ye J, Ji H, Li S, Ni P, Zhang J, Zhang Y, Zheng H, Mao B, Wang W, Ye C, Li S, Wang J, Wong GK, Yang H: A draft sequence for the genome of the domesticated silkworm (Bombyx mori). Science. 2004, 306: 1937-1940. 10.1126/science.1102210.View ArticlePubMedGoogle Scholar
- Ayala FJ, Rzhetsky A, Ayala FJ: Origin of the metazoan phyla: Molecular clocks confirm paleontolgical estimates. Proc Nat Acad Sci USA. 1998, 95: 606-611. 10.1073/pnas.95.2.606.PubMed CentralView ArticlePubMedGoogle Scholar
- Misra S, Crosby MA, Mungall CJ, Matthews BB, Campbell KS, Hradecky P, Huang Y, Kaminker JS, Millburn GH, Prochnik SE, Smith CD, Tupy JL, Whitfied EJ, Bayraktaroglu L, Berman BP, Bettencourt BR, Celniker SE, AD ADG, Drysdale RA, Harris NL, Richter J, Russo S, Schroeder AJ, Shu SQ, Stapleton M, Yamada C, Ashburner M, Gelbart WM, Rubin GM, Lewis SE: Annotation of the Drosophila melanogaster euchromatic genome sequence: a systematic review. Genome Biol. 2002, 3: RESEARCH0083-10.1186/gb-2002-3-12-research0083.PubMed CentralView ArticlePubMedGoogle Scholar
- Celniker SE, Wheeler DA, Kronmiller B, Carlson JW, Halpern A, Patel S, Adams M, Champe M, Dugan SP, Frise E, Hodgson A, George RA, Hoskins RA, Laverty T, Muzny DM, Nelson CR, Pacleb JM, Park S, Pfeiffer BD, Richards S, Sodergren EJ, Svirskas R, Tabor PE, Wan K, Stapleton M, Sutton GG, Venter C, Weinstock G, Scherer SE, Myers EW, Gibbs RA, Rubin CM: Finishing a whole-genome shotgun: release 3 of the Drosophila melanogaster euchromatic genome sequence. Genome Biol. 2002, 3: RESEARCH0079.-10.1186/gb-2002-3-12-research0079.PubMed CentralView ArticlePubMedGoogle Scholar
- Drysdale RA, Crosby MA, . TFBC: FlyBase: genes and gene models. Nucleic Acids Research. 2005, 33:: 390-395. 10.1093/nar/gki046.View ArticleGoogle Scholar
- D.pseudoobscura sequencing project: http://www.hgsc.bcm.tmc.edu/projects/drosophila/.Google Scholar
- Holt RA, Subramanian GM, Halpern A, Sutton GG, Charlab R, Nusskern DR, Wincker P, Clark AG, Ribeiro JM, Wides R, Salzberg SL, Loftus B, Yandell M, Majoros WH, Rusch DB, Lai Z, Kraft CL, Abril JF, Anthouard V, Arensburger P, Atkinson PW, Baden H, Berardinis V, Baldwin D, Benes V, Biedler J, Blass C, Bolanos R, Boscus D, Barnstead M, Cai S, Center A, Chaturverdi K, Christophides GK, Chrystal MA, Clamp M, Cravchik A, Curwen V, Dana A, Delcher A, Dew I, Evans CA, Flanigan M, Grundschober-Freimoser A, Friedli L, Gu Z, Guan P, Guigo R, Hillenmeyer ME, Hladun SL, Hogan JR, Hong YS, Hoover J, Jaillon O, Ke Z, Kodira C, Kokoza E, Koutsos A, Letunic I, Levitsky A, Liang Y, Lin JJ, Lobo NF, Lopez JR, Malek JA, McIntosh TC, Meister S, Miller J, Mobarry C, Mongin E, Murphy SD, O'Brochta DA, Pfannkoch C, Qi R, Regier MA, Remington K, H HS, Sharakhova MV, Sitter CD, Shetty J, Smith TJ, Strong R, Sun J, Thomasova D, Ton LQ, Topalis P, Tu Z, Unger MF, Walenz B, Wang A, Wang J, Wang M, Wang X, Woodford KJ, Wortman JR, Wu M, Yao A, Zdobnov EM, Zhang H, Zhao Q, Zhao S, Zhu SC, Zhimulev I, Coluzzi M, Torre A, Roth CW, Louis C, Kalush F, Mural RJ, Myers EW, Adams MD, Smith HO, Broder S, Gardner MJ, Fraser CM, Birney E, Bork P, Brey PT, Venter JC, Weissenbach J, Kafatos FC, Collins FH, Hoffman SL: The genome sequence of the malaria mosquito Anopheles gambiae. Science. 2002, 298: 129-149. 10.1126/science.1076181.View ArticlePubMedGoogle Scholar
- Mosquito genome browser: http://www.ensembl.org/Anopheles_gambiae/.Google Scholar
- Mosquito gene index: http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=mosquito.Google Scholar
- A. mellifera sequencing project: http://www.hgsc.bcm.tmc.edu/projects/honeybee/.Google Scholar
- Honey Bee EST Project: http://titan.biotec.uiuc.edu/bee/honeybee_project.htm.Google Scholar
- Whitfield CW, Band MR, Bonaldo MF, Kumar CG, Lui L, Pardinas JR, Robertson HM, Soares MB, Robinson GE: Annotated expressed sequence tags and cDNA microarrays for studies of brain and behaviour in the Honey Bee. Genome Res. 2002, 4: 555-566. 10.1101/gr.5302.View ArticleGoogle Scholar
- Nematode.net: http://www.nematode.net/BLAST.Google Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410. 10.1006/jmbi.1990.9999.View ArticlePubMedGoogle Scholar
- Berkeley Drosophila Genome Project: http://www.fruitfly.org.Google Scholar
- Artemis: http://www.sanger.ac.uk/Software/Artemis/.Google Scholar
- Rutherford K, Parkhill J, Crook J, Horsnell T, Rice P, Rajandream MA, Barrell B: Artemis: sequence visualisation and annotation. Bioinformatics. 2000, 16: 944-945. 10.1093/bioinformatics/16.10.944.View ArticlePubMedGoogle Scholar
- Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucl Acids Res. 2003, 31: 3497-3500. 10.1093/nar/gkg500.PubMed CentralView ArticlePubMedGoogle Scholar
- Boxshade: http://www.ch.embnet.org/software/BOX_form.html.Google Scholar
- Ogurtsov AY, Roytberg MA, Shabalina SA, Kondrashov AS: OWEN: aligning long collinear regions of genomes. Bioinformatics. 2002, 18: 1703-1704. 10.1093/bioinformatics/18.12.1703.View ArticlePubMedGoogle Scholar
- Sambrook J, Russell DW: Molecular Cloning: a laboratory manual. 2001, New York, Cold Spring Harbor Laboratory PressGoogle Scholar
- Tautz D, Pfeifle C: A non-radioactive in situ hybridisation method for the localisation of specific RNAs in Drosophila embryos reveals translational control of the segmentation gene hunchback. Chromosoma. 1989, 98: 81-85. 10.1007/BF00291041.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.