Polymorphism of MHC class IIB in an acheilognathid species, Rhodeus sinensis shaped by historical selection and recombination

Background Rhodeus sinensis is a bitterling species occurring throughout the numerous freshwater systems on the East Asia. Here, we analyzed the diversity of the MHC class IIB (DAB) genes from this species, which may offer meaningful insights into evolutionary processes in this species as well as other bitterlings. Results Using cDNA and gDNA samples from 50 individuals, we discovered classical 140 allelic sequences that could be allocated into either DAB1 (Rhsi-DAB1) or DAB3 (Rhsi-DAB3). DAB sequences completely lacking the intron, but identical or similar to Rhsi-DAB1, were also discovered from our gDNA samples, and this intron loss likely originated from the retrotransposition events of processed mDNA. The β1 domain was the most polymorphic in both Rhsi-DAB1 and -DAB3. Putative peptide biding residues (PBRs) in Rhsi-DAB1, but not in Rhsi-DAB3, exhibited a significant dN/dS, presumably indicating that different selection pressures have acted on those two DABs. Recombination between different alleles seemed to have contributed to the increase of diversity in Rhsi-DABs. Upon phylogenetic analysis, Rhsi-DAB1 and -DAB3 formed independent clusters. Several alleles from other species of Cypriniformes were embedded in the clade of Rhsi-DAB1, whereas Rhsi-DAB3 clustered with alleles from the wider range of taxa (Cyprinodontiformes), indicating that these two Rhsi-DABs have taken different historical paths. Conclusions A great deal of MHC class IIB allelic diversity was found in R. sinensis, and gene duplication, selection and recombination may have contributed to this diversity. Based on our data, it is presumed that such historical processes have commonly or differently acted on the polymorphism of Rhsi-DAB1 and -DAB3. Electronic supplementary material The online version of this article (10.1186/s12863-019-0775-3) contains supplementary material, which is available to authorized users.


Background
The major histocompatibility complex (MHC) is a set of genes encoding cell membrane glycoproteins responsible for the initiation of adaptive immune response by displaying antigenic peptide to T lymphocytes in vertebrates [1]. MHC genes are classified into class I and II; class II genes are only expressed on professional antigen-presenting cells (e.g., dendritic cells, macrophages, B lymphocytes), whereas class I genes are expressed on all nucleated cells [1,2]. A peptide fragment loaded onto an MHC class II molecule is derived from antigens endocytosed, digested within lysosomes and presented to the specific receptors on the surface of CD4 + helper T cells [1,2]. MHC class II molecules are heterodimers consisting of two noncovalently associated homogeneous α (α1 and α2 domains) and β chains (β1 and β2 domains; [1,2]). A peptide fragment loaded onto an MHC class I molecule is derived from cytosolic proteins of infected cells, and is presented to specific receptors on the surface of CD8 + cytotoxic T cells [1,2]. MHC class I molecules are also heterodimers composed of three α domains and β 2 -microglobulin [1,2].
Classical MHC class II genes are known to be among the most variable in vertebrate genomes [3][4][5]. The greatest polymorphism can be found in the β1 domain, where peptide-binding residues (PBRs) are located [3,4,6]. In contrast, the level of polymorphism in the α1 domain is relatively low, with a few exceptions [7,8], even though this domain also plays a role in binding with the antigenic peptide. Three major evolutionary forces are known to contribute to the enormous levels of polymorphism observed in the β1 domain [9]: (i) negative frequency-dependent selection [10][11][12], (ii) heterozygote advantage [13,14] and (iii) preferences for MHC dissimilar mates [15][16][17]. Another well-known evolutionary signature of MHC genes is transspecies polymorphism (TSP), which refers to genetic variants whose origin predates speciation, resulting in the occurrence of shared or similar alleles between different, but related, taxa [18][19][20][21][22]. The existence of TSP also means that there must be common alleles to secure survival or high adaptability even in different species under a specific environment [23]. To learn if TSP exist in a family or order and to infer the related evolutionary factors, however, information about a very certain phylogenetic structure among species in the taxa should be available.
Since the first analysis was attempted in carp [24], MHC genes have been characterized in a wide variety of teleost species [3,[25][26][27][28]. Teleost MHC class II can be divided into three major groups, namely, DA, DB and DE, based on their sequence features and phylogenetic clustering patterns [28]. Classical MHC class II genes are only found in DA (DAA: MHC IIα chain; DAB: MHC IIβ chain), whereas DB and DE generally comprise non-classical MHC genes [28,29]. Therefore, MHC class II genes found in DA show tremendous polymorphisms among individuals, and conservative residues that are thought to form hydrogen bonds with antigen peptides [28]. These features do not perfectly appear in MHC class II genes belonging to DB and DE [28].
The primary aim of this study was to identify the signature of evolutionary forces that have acted on the MHC class IIB (DAB) sequences of Korean Rhodeus sinensis, one of the most widely distributed bitterling species (Acheilognathidae). Analyzing MHC sequences can offer meaningful insights into evolutionary processes in this or other bitterling species. First, because bitterlings spawn on freshwater mussels, which serve as intermediate host of many infectious organisms in freshwater ecosystems, response to pathogens and immunity may have played an important role in evolutionary processes in this species [30][31][32]. Second, bitterlings are a representative fish group in which the diversity and evolutionary patterns of DAB genes have not been properly characterized. In fact, only a partial investigation has been conducted in R. ocellatus [33] and Pseudorhodeus tanago [34]. Third, the population sizes of several bitterling species have been declining or at the blink of extinction, due to the introduction of exotic species, climate change and devastation of many natural habitats [35][36][37], which serves as a good opportunity to investigate how size changes in the population and the resulting genetic drift affects MHC allelic diversity. Finally, R. sinensis is found in a variety of rivers with very different environmental features [38][39][40], providing an excellent opportunity to study the differences in selection pressures acting on MHC genes.
This study consisted of three stages. First, the nearlycomplete sequences of DAB genes were identified from 50 individuals of R. sinensis collected from five different drainages, and the structural and functional characteristics were examined. Second, individual and inter-locus variabilities were examined to discover a signature of evolutionary processes acting on DAB diversities in this species. Finally, phylogenetic analyses were conducted to infer the evolutionary history of DAB genes in this species upon comparison with other vertebrates.

Structure and diversity of MHC class IIB
Among primers used (Table 1), only a single pair, SP-F1 and TM-R1, successfully amplified all 50 individual cDNA samples. This primer pair was designed to anneal the sequences of signal peptide (exon 1) and transmembrane region (exon 6). The intronic sequences linking the six exons were identified from gDNA isolated from 20 Nakdong River samples. The amplification with the SP-F1 and TM-R1 yielded 293 sequences, and a total of 140 novel DAB alleles of Rhodeus sinensis were detected. Based on the blasting search and comparison with DAB sequences of other cypriniform species, the alleles were allocated into either DAB1 (Rhsi-DAB1; N = 104) or DAB3 (Rhsi-DAB3; N = 36; Additional file 8: Figure S1 and Additional file 9: Figure S2). All allelic sequences identified in this study were deposited to NCBI GenBank with the accession numbers of MG989278 to MG989423.
The sequence of exon 2 was confirmed to be β1 domain region with PBRs through cDNA and gDNA sequencing (Additional file 8: Figure S1 and Additional file 9: Figure S2). Several conserved residues were found and may be related to the function of β1 domain region based on the comparison with mammalian classical DRB structure [42]. For example, H81 and N82 are predicted to form hydrogen bonds with antigen peptides (Additional file 8: Figure S1 and Additional file 9: Figure S2). N38, S39 and T40 seem to be responsible for N-linked glycosylation, and two conserved cysteine residues, 11C and 76C were thought to form disulfide bridges (Additional file 8: Figure S1 and Additional file 9: Figure S2). In addition, N30 and N59 are residues that are found without exception in all jawed vertebrates, and G46 and Y47 are known to be ray-finned fish specific ( [28]; Additional file 8: Figure  S1 and Additional file 9: Figure S2). Rhsi-DAB1*04:02 and Rhsi-DAB3*06:01-03 showed a single codon insertion and deletion, respectively, in exon 2 (Additional file 8: Figure S1 and Additional file 9: Figure S2). From exon 3 to 6, overall amino acid sequences were highly conserved with no length variation (Additional file 10: Figure S3, Additional file 11: Figure S4, Additional file 12: Figure  S5 and Additional file 13: Figure S6). Exon 3 contained a conserved amino acid motif (from 49 to 65) that seemed to be responsible for binding to CD4 molecule (Additional file 10: Figure S3 and Additional file 11: Figure S4). Two conserved cystein residues (C23 and C29) were predicted to form disulfide bridges in this domain (Additional file 10: Figure S3 and Additional file 11: Figure S4).

Signature of recombination
The result of RDP analysis showed that five Rhsi-DAB alleles likely occurred by recombination events in β1 domain region ( Table 2). All seven algorithms used in this analysis supported that Rhsi-DAB1*04:01 and -DAB3*07:01 were formed from the recombination between two different alleles ( Table 2). Rhsi-DAB1*03:07, −DAB1*03:17 and -DAB1*10:02 could be considered as the recombinant, but not supported by all algorithms. The signature of recombinantion in Rhsi-DAB1*04:01 and -DAB3*07:01 could also be viewed in the network tree analysis, because they did not form clusters with the same allelic groups but instead were located in the middle of the recombination origins (Fig. 1).

Discussion
Our analysis of MHC class IIB variation in Rhodeus sinensis revealed very large numbers of DAB alleles, which was only from 50 individuals. Characterization of DAB was attempted in two other bitterling species, R. ocellatus [33] and Pseudorhodeus tanago [34], as mentioned in Introduction, and only 17 [33] and 16 [34] sequence variants were identified, respectively. However, direct comparison of diversity with these species are not likely to be possible, since Pseudorhodeus tanago was listed as critically endangered species [34] and the MHC class IIB of R. ocellatus was analyzed simply because of the need to gather information on MHC allele diversity in behavioral experiments [33]. Our study aimed to analyze nearly complete DAB sequences, and all 140 alleles characterized included a signal peptide, two extra-cellular domains (β1 and β2), a connecting peptide, a trans-membrane region and a cytoplasmic domain. Upon comparison with other teleost or vertebrate species at the codon level, major residues predicted to be responsible for the processes in adaptive immune response were completely identified, indicating that all of the Rhsi-DAB sequences could be regarded as functional and classical alleles. One of the surprising observations from this study was that SEGs (MHC class IIB lacking the introns) were discovered from gDNA samples. Intron loss has been reported across many vertebral genes (e.g., hsp70 in zebrafish [43], the histone gene family in vertebrates [44], claudin and olfactory receptor genes in teleosts [45]). Intron loss in MHC class IIB was reported in two teleost species, Gasterosteus aculeatus and Tetraodon nigroviridis [45]. It has been proposed that intron loss could be mediated by random or homologous recombination of ancestral genome with the DNA fragment reverse-transcribed from processed mRNA [46][47][48]. Considering that partial intron loss was not observed in our study, these SEGs likely originated from the insertion of processed cDNA into a nonhomologous genome location (retrotransposition event; see [45]). Nine variants of SEGs were found to be identical or similar to Rhsi-DAB1, while no SEG similar to Rhsi-DAB3 was found, indicating that this event has probably occurred since the  divergence between DAB1 and DAB3 were completed. Given that intron loss is more common in highly expressed genes, such as housekeeping genes [46][47][48], the reason why intron loss was observed only in Rhsi-DAB1 can be deduced from the difference in expression level. However, it is premature to evaluate the difference in expression level without comparing mRNA levels. It remains to assure that intron loss is prevalent in all R. sinensis populations, becaue all of the gDNA samples used were obtained from the Nakdong River. It is thus necessary to investigate the existence of SEG from other drainages in future study. The phylogenetic separation between DAB1 and DAB3 was evident in our BI tree reconstructed using the β1 domain as shown in a study of the Cypriniformes species [49]. Multiple loci may exist within Rhsi-DAB1, given that more than three alleles were frequently found in a single individual sample. Assumption based only on the BI tree revealed that Rhsi-DAB1*05 is the most prominent locus candidate, and the remaining allelic groups were likely divided into five; (1) Rhsi-DAB1*01, −*02 and -*08; (2) Rhsi-DAB1*06, −*07, −*11, −*12, −*13, −*14 and -*15; (3) Rhsi-DAB1*03; (4) Rhsi-DAB1*04 and -*09; (5) Rhsi-DAB1*10. No individuals with all six locus-candidates were observed, and the number of Rhsi-DAB1 loci possessed by each individual appears to be different, given that the individuals showed considerably different composition of locus-candidates. Although its diversity was not as high as Rhsi-DAB1, there were roughly three locuscandidates in Rhsi-DAB3, if speculating based solely on the BI tree. In contrast to the enormous diversity observed in our results, no evidence of gene duplication was reported in bitterling species to dates [17,33]. Although gene duplication of DABs may be thought of as a speciesspecific evolutionary event that occurred in R. sinensis, it may be wise to suspend judgment until tests using various primers are conducted. It has often been found that slightly different primer sets, which were designed for almost the same region, exhibited amplification bias yielding different sequences or different number of alleles.
In both Rhsi-DAB1 and -DAB3, the β1 domain region was the most polymorphic, and the signature of positive selection was clearly detected. The amino acid sequences of the remaining regions were highly conservative with no clear sign of positive selection. In the β1 domain region, the polymorphism level of Rhsi-DAB3 was less than that of Rhsi-DAB1, which was also observed in other studies of Cypriniformes species [50,51]. In addition, Rhsi-DAB1, but not Rhsi-DAB3, showed a large and significant d N /d S , when considering only PBRs. Presumably, different types of selection pressures have acted on Rhsi-DAB1 and -DAB3 (see also [52,53]). Our results suggest that while Rhsi-DAB1 is reponsible for the binding to a wide range of pathogenic peptides, Rhsi-DAB3 may be associated with detection of some specialized antigenic peptides. If so, highly variable Rhsi-DAB1 might have been specialized to interact with many different types of pathogens, which has probably led to the specific pathogen-mediated selections (see also [54,55]). This explanation is highly plausible, considering that a slight difference in amino acid sequence among MHC class IIB alleles may be associated with adaptation to different pathogens (e.g., [56][57][58][59]). If the pathogenic fauna and allelic variation at the population level are studied, it will be possible to examine whether the allelic diversity of Rhsi-DAB1 is the result of adaptation to different types of pathogens.
Recombination between different alleles seemed to have at least partially contributed to the increase or maintenance of diversity in Rhsi-DABs. Several other studies also reported examples of MHC class IIB alleles originating from recombination [3,[60][61][62]. In this study, some of the recombination events occurred between Rhsi-DAB1 and -DAB3. Since Rhsi-DAB1 and -DAB3 were expected to differ in terms of their diversity and fuctional range, as mentioned above, further studies should be conducted to determine if the alleles originating from interlocus recombinations have resulted in adaptive benefits in their habitats.
The existence of the phylogenetic lineages containing alleles of Rhsi-DAB1, Hyam-DAB1, Hymo-DAB and Ctid-DAB probably suggests that the occurrence of Rhsi-DAB1 alleles should predate the ages of diversification into the present species in Cypriniformes. Specifically, the sister relationship between Rhsi-DAB1*03 and Ctid-DAB showed that the Rhsi-DAB1*03 allelic group likely have existed before the divergence of Leuciscidae and Acheilognathidae that could be estimated to be around 66 million years ago (MYA) when speculating based on the divergence time between the representative genera of those two families, Ctenopharyngodon and Rhodeus [63,64]. Rhsi-DAB3 formed a cluster with Cyprinodontiformes, which probably means that the sequence structure of Rhsi-DAB3 has been maintained as more conserved rather than that of Rhsi-DAB1. Previous studies estimated that Cyprinodontiformes diverged from other teleosts at around 229.9 MYA [63,64]. It thus can be predicted that the strength of selection pressure acting on DAB3 might definitely be weaker than that acting on DAB1, considering that the habitat range of Cyprinodontiformes is much wider than Cypriniformes.
Our analysis revealed the evolutionary processes that have at least partially contributed to the formation of DAB diversity in R. sinensis. What needs to be revealed or confirmed in the future can be summarized as follows. First, what adaptive importance does the alleles identified here have? Second, did the alleles were born before or after the birth of this species? These questions can also be resolved at least in part through the investigation of allelic frequency among R. sinensis populations inhabiting different environments or direct comparison among species of the same genus (i.e., Rhodeus) and with other major genus species (i.e., Tanakia or Acheilognathus) in Acheilognathidae. Finally, are the alleles found here products of the same gene or other (duplicated or somewhat distant) genes? Genomic analysis, which has become highly advanced and inexpensive in recent years, will provide a close answer to this question.

Conclusion
Using cDNA and gDNA samples from 50 individuals of R. sinensis, a great deal of MHC class IIB allelic diversity was found, and gene duplication, selection and recombination may have contributed to this diversity. A total of 140 allelic sequences could be allocated into two different loci, Rhsi-DAB1 and -DAB3. Numerous variants of MHC IIB lacking the introns were found from our gDNA samples and these sequences appeared to be historically derived from the retrotransposition events of processed mRNA. Upon robust phylogenetic analysis, Rhsi-DAB1 and -DAB3 formed completely independent clusters. Based on our data, it is presumed that such historical processes have commonly or differently acted on the polymorphism of Rhsi-DAB1 and -DAB3.

Sampling
Ten individuals of Rhodeus sinensis were collected each from five different rivers (Han, Hyeongsan, Mangyeong, Nakdong and Tamjin) on the Korean Peninsula for the extraction RNA samples. Twenty individuals were additionally collected from the Nakdong River and used for the analysis of genomic sequence structure.

Extraction of RNA and DNA
RNA was isolated from the brain tissue of each individual using TRIzol (Invitrogen, Carlsbad, CA, USA) reagent according to the manufacturer's protocol. Before tissue removal for the RNA extraction, each individual fish was euthanized with MS-222 (250 mg/L; Sigma-Aldrich, St. Louis, MO, USA). Brain tissues were used was since they were the samples that should also be analyzed to investigate the expression of non-immune genes. Complementary DNA (cDNA) was synthesized using RNA (800-2000 ng) extracted, oligo-dT and GoScript™ Reverse Transcriptase (Promega, Madison, WI, USA). Genomic DNA (gDNA) was extracted from muscle using a DNeasy Blood and Tissue kit in accordance with the manufacturer's protocol (Qiagen, Dusseldorf, Germany). All individuals used for DNA and RNA extraction are housed in the Department of Life Science at Yeungnam University as a 95% ethanol sample. Our sampling and experimental procedure were approved by the Yeungnam University Institutional Animal Care and Use Committee (Protocol # 2015013). The concentration of genetic samples was measured using Maestro-Nano (Maestrogen, Hsinchu City, Taiwan).

Primers design, PCR, cloning and sequencing
The universal primers designed for cyprinids in previous studies [33,41] were used to obtain the draft DAB sequences of R. sinensis (Table 1). Specific primers were also designed based on the draft sequences obtained (Table 1). Each 50 μL mixture for PCR amplification contained 50-100 ng of DNA (cDNA or gDNA), 1 × PCR buffer, 3 mM MgCl 2 , 0.25 mM of each forward and reverse primer, 0.2 mM dNTP and 0.25 unit of Taq DNA polymerase (Solgent, Daejeon, South Korea). PCR amplification was performed using GenePro (Bioer, Hangzhou, China) under the following program setup: 94°C for 10 min, 35 cycles of 40s at 94°C, 45 s at 52-64°C (depending on the primers) and 50s at 72°C, and 72°C for 10 min. The amplified products were ligated into the pGEM-T Easy vector (Promega) and transformed into E. coli DH5α. Ten to sixteen white colonies were selected from each individual for the amplification by SP6 and T7 primer set (T a = 56°C). Being successfully identified on a 2% agarose gel, the PCR products were purified and sent for the commercial sequencing to Macrogen Inc. (Seoul, South Korea).
The nucleotide sequences obtained from cDNA or gDNA samples were regarded as a valid allelic sequence only when identified in at least two separated clones and two different individuals to avoid the possibility of artifacts. To examine the possibility of cross-over contamination, a negative control tube containing purified water instead of a DNA sample was placed in every amplification sample set. All data used in the analyses included only those with no amplification reaction at the negative control. Analyses of cDNA and gDNA samples were performed in a completely separated state in time and space. The sequences identified as valid were aligned using CLUSTALX [65] implemented in GENEIOUS v.9.1.8 [66]. The DAB alleles were named following the nomenclature (locus*allelic group:protein sequence) [67].

Tests of positive selection
The putative PBRs in β1 domain region were identified based on the comparison with the sequences characterized in the previous studies [42,76,77]. The ratio of non-synonymous (d N ) and synonymous (d S ) substitutions (ω) was estimated by Nei-Gojobori method [78] with 2000 bootstrap replicates and modified under Juke Cantor corrections, which was used to determine the strength of historical selection pressure acting on DAB sequences.
The signature of positive selection was detected for each codon in the DAB allelic sequences using HyPhy package [79] implemented in DataMonkey web server (http:// www.datamonkey.org/), where four different codon-based maximum likelihood tests, namely, SLAC (Singel Likelihood Ancestry Counting), FEL (Fixed Effects Likelihood), MEME (Mixed Effects Model of Evolution) and FUBAR (Fast Unconstrained Bayesian AppRoximation) were used. For this purpose, phylogenetic relationship was reconstructed under the default setting in DataMonkey.
Each codon was also tested for the signature of positive selection using CODEML implemented in PAMLX package [80,81]. For this test, Bayesian posterior probability (BPP) was calculated based on the Bayes empirical Bayes (BEB) method, and the codon was regarded as showing the significant signature of positive selection when the BPP was greater than 95%. Maximum likelihood (ML) phylogenetic tree was reconstructed using rapid bootstrap analyses (1000 replicates) implemented in RAxML-GUI v.1.5 [82], which was used for the input resource in CODEML. The likelihood ratio tests (LRT) were carried out to compare codon-based models, for example, between M1a (nearly neutral) and M2a (positive selection), between M7 (beta distribution) and M8 (beta distribution and positive selection), and between M0 (one-ratio) and M3 (discrete).