Genome-wide identification, structural and gene expression analysis of the bZIP transcription factor family in sweet potato wild relative Ipomoea trifida

Background The basic leucine zipper (bZIP) transcription factor is one of the most abundant and conserved transcription factor families. In addition to being involved in growth and development, bZIP transcription factors also play an important role in plant adaption to abiotic stresses. Results A total of 41 bZIP genes that encode 66 proteins were identified in Ipomoea trifida. They were distributed on 14 chromosomes of Ipomoea trifida. Segmental and tandem duplication analysis showed that segmental duplication played an important role in the ItfbZIP gene amplification. ItfbZIPs were divided into ten groups (A, B, C, D, E, F, G, H, I and S groups) according to their phylogenetic relationships with Solanum lycopersicum and Arabidopsis thaliana. The regularity of the exon/intron numbers and distributions is consistent with the group classification in evolutionary tree. Prediction of the cis-acting elements found that promoter regions of ItfbZIPs harbored several stress responsive cis-acting elements. Protein three-dimensional structural analysis indicated that ItfbZIP proteins mainly consisted of α-helices and random coils. The gene expression pattern from transcriptome data and qRT-PCR analysis showed that ItfbZIP genes expressed with a tissue-specific manner and differently expressed under various abiotic stresses, suggesting that the ItfbZIPs were involved in stress response and adaption in Ipomoea trifida. Conclusions Genome-wide identification, gene structure, phylogeny and expression analysis of bZIP gene in Ipomoea trifida supplied a solid theoretical foundation for the functional study of bZIP gene family and further facilitated the molecular breeding of sweet potato. Electronic supplementary material The online version of this article (10.1186/s12863-019-0743-y) contains supplementary material, which is available to authorized users.


Background
Transcription factors (TFs) are active proteins that recognize and bind to specific sites on a promoter to activate or inhibit gene expression [1,2]. The basic leucine zipper (bZIP) TFs is one of the most diverse families of TFs [3,4]. Structurally, this family contains a highly conserved bZIP domain (60-80 amino acids long) which is divided into two parts: the basic region and the leucine zipper region [5]. The basic region, which is the most conserved core part of the bZIP TF, consists of approximately 16 amino acid residues. This region contains an invariant motif N-X7-R/K-X9, which is mainly involved in nuclear localization and target DNA binding function. Meanwhile, the leucine zipper region is less conserved and composed of heptad repeats of Leu or other bulky hydrophobic amino acids (Ile, Val, Phe or Met) positioning exactly the nine amino acids towards the C terminus [1,2]. Through the interaction of the hydrophobic amino acids in the helical region, the two subunits are tightly bound together to form a coiled-coil dimer structure. This structure affects the binding characteristics, expression diversity and gene regulation of the target gene.
Sweet potato (Ipomoea batatas) of the family Convolvulaceae is an annual or perennial herb with an extremely high economic value. To date, sweet potato is the seventh largest crop in the world. It is an important food crop, forage crop and a new type of bio-energy crop in China. Sweet potato is a hexaploid cultivar with 90 chromosomes. Due to its large genome and high genetic heterogeneity, the whole genome sequencing and assembly as well as the other related genomics research is very complicated. However, diploid Ipomoea trifida G. Don (2n = 2x = 30) which belongs to Batata section B of the genera Ipomoea, is considered as an ancestral species of hexaploid cultivated sweet potato [38][39][40]. The diploid Ipomoea trifida is an ideal model species for studying self-incompatibility, genetic mapping, physical mapping, sweet potato breeding, sweet potato transgenic system construction and whole genome sequencing due to its small size, low ploidy, small chromosome number and simple genetic manipulation. In 2017, the genome data of diploid sweet potato variety Ipomoea trifida were released (http://sweetpotato.plantbiol ogy.msu. edu/), making it possible to identify and analyse important gene families at the whole genome level in Ipomoea trifida [41].
In this study, bZIP gene family members were identified in Ipomoea trifida. Using various bioinformatics tools, we performed ItfbZIP subfamily classification, gene intron/ exon distribution analysis, conserved domain and three-dimensional structural homology modeling prediction. And the ItfbZIP gene expression profiles generated from RNA-seq were confirmed by quantitative RT-PCR in different plant tissues under various abiotic stresses. This study could be helpful for further functional study of bZIP genes and molecular breeding of sweet potato.

Results
Identification and designation of bZIP TFs in the Ipomoea trifida genome A total of 66 ItfbZIP proteins encoded by 41 ItfbZIP genes were identified. They all include at least one bZIP domain (bZIP_1: PF00170.20, bZIP_2: PF07716.14 or bZIP_Maf: PF03131). The ItfbZIP genes were numbered according to its position on the chromosome as ItfbZIP1~ItfbZIP41. Different transcripts encoded by the same gene were given the similar names. For instance, 3 transcripts of ItfbZIP9 gene are ItfbZIP9.1, ItfbZIP9.2 and ItfbZIP9.3 (Additional file 1: Table S1). At the same time, ItfbZIP protein size (aa), MW, pI, subcellular location and phosphorylation site were analyzed ( Table 1). The length of ItfbZIP proteins ranges from 158 AA (ItfbZIP41.1) to 754AA (ItfbZIP40), MW varies from 17,227.12 (ItfbZIP41.1) to 81,305.95 (ItfbZIP40) Da and pI distributes from 5.61 (ItfbZIP14) to 9.69 (ItfbZIP41.1). They are all predicted to be located in the nucleus. We used the online tool P3DB to predict the phosphorylation sites of the ItfbZIPs. As listed in Table 1, the ItfbZIPs contain 4 to 28 phosphorylatio sites, wherein the maximum number of phosphorylation sites is 28 for ItfbZIP34.1 and ItfbZIP34.2. The minimum number of phosphorylation sites is 4 for ItfbZIP25.2 and ItfbZIP27.2. About 60% of the ItfbZIPs contain 10 or more phosphorylation sites. Up to 80% of the ItfbZIPs contain more Ser phosphorylation sites than Thr phosphorylation sites.
Chromosome localization and duplication of the ItfbZIP gene family The ItfbZIP genes were mapped on 15 chromosomes. As shown in Fig. 1, nine ItfbZIP genes on Chr 9; five ItfbZIP genes on Chr 5; four on Chr 10; three on Chr 1, Chr 3, Chr 14 and Chr 15; two on Chr 4, Chr 7, Chr 8 and Chr 12; only one on Chr 2, Chr 6 and Chr 11; and no ItfbZIP gene on Chr 13, indicating that the distribution of Itfb-ZIP genes is disproportionate on chromosomes.

Distribution of bZIP TFs in eukaryotes
To understand the evolutionary relationship of bZIP TFs among different eukaryotes, we performed phylogenetic tree construction on 28 species (including fungi, metazoa and plants) and annotated every species with the number of bZIPTFs identified in previous literatures ( Fig. 2 and Additional file 2: Table S2). The figure shows only a few bZIP homologs exist in Saccharomyces cerevisiae (17) and Ustilaginoidea virens (28). However, plants have a large number of bZIP genes. The number of bZIPTFs in monocotyledonous plants ranges from 89 to 98 with exceptions of 125 for Zea mays and 141 for Hordeum vulgare, which may be due to the common ancestor and similar complete genome duplication of the gramineous plants [45]. The range of bZIPTFs in dicotyledonous plants is 41 to 247, wherein the maximum number of bZIPTFs in tetraploid Brassica napus is 247, probably due to the large number of gene duplications. Although Ipomoea trifida and Solanum lycopersicum belong to Solanales, the number of ItfbZIP gene (41) is 28 less than that of SlbZIPs (69), indicating that the bZIP gene number of Ipomoea trifida has been reduced during evolution. In general, the numbers of bZIPs in Homo sapiens and higher plants are more than that in fungus, lower animals and plants, which may be closely related to the ability to regulate physiological responses to environmental stimuli in eukaryotic species.

Phylogenetic relationship of bZIP proteins in Arabidopsis thaliana, Solanum lycopersicum and Ipomoea trifida
To study the evolutionary relationship of bZIPs among Ipomoea trifida, Arabidopsis thaliana and Solanum lycopersicum, we established phylogenetic tree of bZIPs of these three species using MEGA7 by Maximum Likelihood method with bootstrap analysis (1000 replicates) ( Fig. 3 and Additional file 3: Table S3). The phylogenetic tree consists of 75 Arabidopsis thaliana, 69 Solanum lycopersicum and 66 Ipomoea trifida bZIP protein sequences. The 75 bZIP TFs identified in Arabidopsis thaliana were divided into 10 subgroups of A-I and S [6]. The specific distribution of the ItfbZIP proteins was as follows: A (8), B (1), C (6), D (22), E (7), F (2), G(5), H(3), I(11) and S(1). Among the 66 ItfbZIPs, 18% of the ItfbZIPs is close related AtbZIPs, and 72% of the ItfbZIPs is close related SLbZIPs, which is consistent with that Ipomoea trifida is belonged to Solanales together with Solanum lycopersicum rather than Arabidopsis thaliana.

Gene structure and conserved domain analysis
The intron/exon structure was determined by analysing genomic DNA with full-length ItfbZIP CDS sequences (Fig. 4). Results indicated that the ItfbZIPs contains at least one intron except ItfbZIP26 with no intron. The D  Table S1 group ItfbZIPs contain 7 to 11 introns, followed by the G group with 4 to 6 introns, the C group with 4 to 5 introns, the H group with 2 or 3 introns, the S group with 3 introns, the A, E and I group with 1 to 3 introns, the B group with 1 intron, and finally the F group with 0 or 1 intron. The different ItfbZIP transcripts encoded by same ItfbZIP gene have similar exon/intron structure and intron phase patterns. Conserved domain analysis of the ItfbZIP proteins was performed using the SMART and Pfam databases. The DNA binding ability and the heterozygosity of bZIP TFs are determined by the bZIP conserved domain, which mainly includes the basic domain (N-× 7-R/K) and the leucine zipper domain (L-× 6 -L-× 6-L). The results indicate that the majority of the ItfbZIP family members contain a highly conserved bZIP domain visualized by genedoc software in Additional file 4: Figure S1 replaced by Ile. Interestingly, the conventional 9 amino acid residues are replaced by 12 amino acid residues at R/K-× 9-L region of ItfbZIP11, which is similar to Osb-ZIP34 in rice [3]. The leucine zipper region contains a Leu at 7th amino acid position (9 amino acids after R/K in the C-terminal extension), but sometimes Leu is replaced by other hydrophobic residues (IIe, Val and Met), such as ItfbZIP12, ItfbZIP13, ItfbZIP23.1, ItfbZIP23.2, ItfbZIP27.1, ItfbZIP27.2.

Cis-element prediction of ItfbZIP genes
To understand the potential regulatory mechanisms of ItfbZIP gene responding to abiotic stress, cis-elements in the ItfbZIP promoter were analyzed using Analysis Navigator (PlantPAN 2.0) database and plantCARE (Additional file 5: Table S4). Every promoter of ItfbZIP gene has at least one stress related cis-element. Among these elements, 96% of the ItfbZIP genes contained multiple stress-responsive regulatory elements (HSE stress response element, MBS response element for drought

Interaction network of the ItfbZIP proteins
Understanding the functional relationships of the ItfbZIP proteins is important for studying the family's regulatory pathways. Thus, we used the STRING software to construct an ItfbZIP gene interaction network based on Arabidopsis thaliana orthologous genes, systematically analysing the interactions of ItfbZIP proteins (Fig. 5).

Homology modeling of ItfbZIP proteins
Swiss-Model was used to analyze three-dimensional structural homology modeling of ItfbZIP amino acid sequences. Because SWISS-model does not predict effectively for sequences with low homology, we retained 34 models of ItfbZIP proteins with homology higher than 30%. Figure 6 shows the ItfbZIP three-dimensional structural models in the subfamilies A (ItfbZIP-11, − 19.1,-19.

Gene expression analysis of ItfbZIPs in various tissues of Ipomoea trifida
To gain insights into the role of the ItfbZIP genes in the growth and development, we used RNA-seq data from seven tissues (callus_flower, callus_stem, flower, flower bud, leaf, root and stem) to study the expression patterns of the ItfbZIP genes in different tissues. As shown in Figs

Expression of ItfbZIP genes under abiotic stress and ABA treatment
The bZIP TF family plays an important role in plant stress response. Thus, it is very necessarily to investigate the expression of ItfbZIPs under abiotic stress and hormonal treatment. Figure 9 shows the expression of Itfb-ZIP genes under drought, salt, cold and heat stress.

Discussion
Sweet potato is an important grain, health care and industrial raw material crop that features wide adaptability, high yield and strong resistance to environmental stress. China has the largest planting area and highest yield of sweet potato in the world. However, the genetic background of cultivated sweet potato is complex. The cytogenetics and genomics basis of sweet potato are much weaker than that of other food crops (such as rice, corn and wheat) [50]. However, the genome of Ipomoea trifida as most probably the progenitor of the sweet potato was released recently, which is very helpful for the study of the sweet potato genetic improvement. BZIP is the most widely distributed TF involved in stress resistance in the eukaryote community. In plants, it plays an important role not only in growth and development but also in the response to abiotic and biotic stresses. To date, most of the studies on the physiological and molecular mechanisms of bZIP are focused on Arabidopsis thaliana and rice plants but not on Ipomoea trifida. Whole genome analysis ItfbZIP genes are usually used to screen new varieties of plants with high yield and resistance to stress condition. Genome-wild study of ItfbZIPs in Ipomoea trifida has an important guiding role in the further in-depth study of bZIP gene function and molecular breeding of sweet potato.

Evolution of ItfbZIP genes
In this study, 41 ItfbZIP genes encoding 66 transcripts were identified. The number of ItfbZIP genes is significantly less than that of other Solanales and cereal crops that have been genome sequenced (Fig. 2), indicating that the ItfbZIP gene family shrinks during the long evolutionary process. All the ItfbZIPs were predicted to be localized in the nucleus; this finding is consistent with the reported experiment results that the CAbZIP1 of pepper, the OsABI5 of rice and the TabZIP1 of wheat are all localized in the nucleus [51][52][53]. The number of ItfbZIP genes in every chromosome of Ipomoea trifida is in the range of 1 to 10 (Fig. 1). And ItfbZIPs in Ipomoea trifida showed the similar chromosomal disproportionate distribution pattern with bZIP genes in Hordeum vulgare [49], Cucumis sativus [11] and Medicago truncatula [54]. Segmental and tandem duplication analysis indicates that the contribution of tandem duplication was limited for ItfbZIPs and segmental duplication played a dominant role in the amplification of ItfbZIP gene, which was consistent with the findings in Arabidopsis thaliana, rice and grape [10,28,32]. Based on the phylogenetic relationship, ItfbZIP family are divided into ten subfamilies, namely, A, B, C, D, E, F, G, H, I and S (Fig. 3), which is similar to bZIP subfamily classification reported in Maninot esculenta, Musa acuminata and Vitis vinifera. [10,12,55]. Moreover, the analysis results of intron/exon gene structure also support phylogenetic analysis. That is, the regularity of the number and distribution of introns is consistent with the subfamily classification from evolutionary tree (Fig. 4).  Fig.S1). Protein homology modeling indicated that α-helix is the main structure of ItfbZIP proteins, which is supported by previous research results: the leucine zipper in the conserved region of bZIP forms one amphiphilic α-helices, which is the dimerization of bZIP protein before binding to DNA [57].
Expression and potential functions of ItfbZIP genes bZIP TFs play essential roles in many biological processes, such as plant growth, development and stress responses et al. In Arabidopsis thaliana, AtbZIP16 can regulate early seedling development by integrating light and hormone signalling pathways and promote seed germination and hypocotyl elongation in the early stage of seedling germination [58]. AtbZIP28 binds to the endoplasmic reticulum membrane and plays an important role in resistance to heat stress [59]. AtbZIP10-LSD1 regulates basic defence and cell death in Arabidopsis thaliana after infection [60]. In rice, OsbZIP12 is a positive regulator of ABA signalling, conferring ABA-dependent drought tolerance [61]. And OsbZIP23 and OsbZIP72 also increase drought resistance through the ABA pathway [29,62]. Overexpression of Osb-ZIP66 and OsbZIP71 enhances the drought tolerance in rice plants [63,64]. And overexpression of the OsbZIP60 enhances heat and drought resistance [65]. Some bZIP genes have also been studied in Solanales plants. For example, over-expression of CaHB1 gene in tomato can enhance tomato salt tolerance [66]. Expressing tomato SIAREB in Arabidopsis thaliana triggers AtRd29A, AtCOR47 and SICI7-like dehydrin in drought and salt responses [67]. Overexpressing SlAREB1 in tomato can increase the plant tolerance to salt and water stresses [68]. Tobacco bZIP transcription factors TGA2.2 and TGA2.1 have distinct roles in plant defense and plant development [69]. In our study, ItfbZIPs which have high homology with Arabidopsis thaliana and Solanum lycopersicum bZIPs may play similar roles in specific biological processes. Protein phosphorylation is one of the most important post-translational modifications and plays an essential Fig. 8 The comparison between quantitative RT-PCR data and RNA-seq data. The relative expression of the 14 selected ItfbZIP genes was analyzed by qRT-PCR. The GAPDH transcript levels were used for normalization. The y-axis represents the relative expression of the fold. Characters on the x-axis represent various tissues, Error bars indicate standard deviation. qRT-PCR data represented by gray bars, and black bars represent RNA-seq data role in regulating the activity of TFs. Regulating the TF's entry into the nucleus may also change the DNA-binding ability or activity of the TF. The activation of rice TREB protein and Arabidopsis thaliana ABI5 protein require phosphorylation [70,71]. Activation of threonine/serine casein kinase II (CKII) phosphorylation of G-box-binding factor 1 (GBF1) in Arabidopsis thaliana is associated with plant senescence [72]. bZIP transcription factors AREB1, AREB2 and ABF3 are phosphorylated by SNF1-related protein kinase SRK2D, which activates the cascade response of plants to drought and water stress depending on ABA signaling pathway [36]. The predicted phosphorylation sites of ItfbZIP proteins are shown in Table 1. All ItfbZIP proteins have 4 to 18 phosphorylation sites, suggesting that ItfbZIPs may act through post-transcriptional phosphorylation modification.
bZIP gene is involved in plant tissue and organ development. Investigation of tissue-specific gene expression pattern helps us get some hints about the gene function. RNA-seq data were used to analyse the expression pattern of bZIP genes in the roots, stems, leaves and flowers of Ipomoea trifida. As shown in Figs. 7, 40% of the ItfbZIP transcripts have similar expression patterns, and 12 genes are highly expressed in these four tissues. These findings indicate that these genes may play an important role in the plant growth development. Interestingly, ItfbZIP28.2 is highly expressed only in the leaf, ItfbZIP4 is lowly expressed only in the roots and ItfbZIP17.2 is highly expressed only in the callus flower and callus stem, indicating that these genes function in specific organs during the growth and development of Ipomoea trifida.
To date, increasing evidence shows that bZIP genes are involved in hormonal/abiotic stress and related signal transduction pathways. In A. thaliana, TGA2, TGA5 and TGA6 are essential activators of defence responses induced by salicylic acid (SA)/ethylene [73]. TGA7 is involved in plant drought stress. TGA9 and TGA10 are involved in plant immune responses [74]. Combination analysis of the cis-acting elements and protein interaction networks suggests that ItfbZIPs protein may participate in KEGG signalling pathway and plant hormone signal transduction (ID 4075) ( Fig. 5 and Additional file 5: Table S4). ItfbZIPs play different roles under various abiotic stress conditions ( Fig. 9 and Fig. 10).

Conclusion
Ipomoea trifida genome has 41 ItfbZIP genes. These genes are localized on 14 chromosomes and are uniformly named according to their chromosomal location. According to the phylogenetic analysis of Arabidopsis thaliana and Solanum lycopersicum, ItfbZIP gene family is divided into ten subfamilies which have certain genetic relationships and have similar biological functions. The phylogenetic tree can be used to infer the diversity and conservation of genes during evolution. This finding is supported by intron/exon structures, cis-acting elements and protein interaction networks. The expression patterns of the bZIP gene family members are also analyzed using RNA-Seq data and qRT-RNA. Results reveal that bZIP TFs play an important role in plants abiotic stress responses. This study provides a theoretical basis for the application of ItfbZIP genes in molecular breeding of sweet potato.

Plant materials and stress treatments
Ipomoea trifida plants were collected from the Sweet Potato Research Institute, Xuzhou Academy of Agricultural Sciences, National Sweet Potato Industry System, China. The plants were grown in soil and vermiculite (1:1) under the greenhouse conditions of 28/22°Cday/ night and a photoperiod of 16 h light/8 h dark. And Ipomoea trifida (2x) were watered every 5 days. In order to analyze different tissues expression profiles of ItfbZIP genes, roots, stems and leaves were sampled from 4-week-old Ipomoea trifida plants and flowers were sampled from 2-month-old Ipomoea trifida plants. To study the expression patterns related various stress and hormone treatments, 4 week old Ipomoea trifida plants were subjected to 200 mM NaCl, 300 mM mannitol, 50 uM abscisic acid solution, respectively. For cold and heat treatment, 4 week old Ipomoea trifida plants were subjected to 12°C and 40°C, respectively. Roots, stems and leaves of the above treated plants were sampled after 0 h, 6 h, 12 h, 24 h and 48 h treatments.

Phylogenetic tree construction, domain identification and protein homology modeling
The ClustalX program was used to perform multiple sequence alignments of the bZIPs of Ipomoea trifida, A. thaliana and S. lycopersicum. The Maximum Likelihood method was used to construct the phylogenetic tree by MEGA7 programme [80,81]. The Bootstrap value was set to 1000. The sequence alignment of the conserved domain (bZIP domain) of ItfbZIPs was performed by using the online SMART (http://smart.embl-heidelberg.de/) and Pfam database (http://pfam.xfam.org/search# tabview = tab1), and then visualized using genedoc. STRING software was used to construct functionally interacting protein networks with a confidence parameter set at 0.15 threshold. Online SWISS-MOLD (https://www.swissmodel.expasy. org/) [82] and Pymol software were used for homology modeling of ItfbZIP proteins.

Transcriptome analysis
The ItfbZIP RNA-seq data were downloaded from the Sweetpotato Genomics Resource (http://sweetpotato.plant biology.msu.edu/). The data are shown in Additional file 6: Tables S5 and Additional file 7: Table S6. The ItfbZIP gene expression levels were calculated as fragments per