A genome-wide scan for signatures of selection in Chinese indigenous and commercial pig breeds

Background Modern breeding and artificial selection play critical roles in pig domestication and shape the genetic variation of different breeds. China has many indigenous pig breeds with various characteristics in morphology and production performance that differ from those of foreign commercial pig breeds. However, the signatures of selection on genes implying for economic traits between Chinese indigenous and commercial pigs have been poorly understood. Results We identified footprints of positive selection at the whole genome level, comprising 44,652 SNPs genotyped in six Chinese indigenous pig breeds, one developed breed and two commercial breeds. An empirical genome-wide distribution of Fst (F-statistics) was constructed based on estimations of Fst for each SNP across these nine breeds. We detected selection at the genome level using the High-Fst outlier method and found that 81 candidate genes show high evidence of positive selection. Furthermore, the results of network analyses showed that the genes that displayed evidence of positive selection were mainly involved in the development of tissues and organs, and the immune response. In addition, we calculated the pairwise Fst between Chinese indigenous and commercial breeds (CHN VS EURO) and between Northern and Southern Chinese indigenous breeds (Northern VS Southern). The IGF1R and ESR1 genes showed evidence of positive selection in the CHN VS EURO and Northern VS Southern groups, respectively. Conclusions In this study, we first identified the genomic regions that showed evidences of selection between Chinese indigenous and commercial pig breeds using the High-Fst outlier method. These regions were found to be involved in the development of tissues and organs, the immune response, growth and litter size. The results of this study provide new insights into understanding the genetic variation and domestication in pigs.


Background
Pigs and humans have interacted for approximately 10,000 years, and as a major protein source for humans, the pig is one of the most important domestic animals [1]. Domestic pigs originated from the Eurasian wild boar (Sus scrofa) approximately 9000 years ago. European and Asian pigs were domesticated independently and introgression of the Asian domestic pig into the European pig occurred after domestication [2,3]. Most of these breeds (especially commercial breeds) have been subjected to strong artificial selection to improve pork productivity. However, different breeds show large differences in morphology and production performance due to various breeding objectives, selection systems and rearing environments; nevertheless, very little is known on the molecular mechanisms of artificial selection on pigs.
The development of high-throughput sequencing and genotyping technologies makes it possible to investigate the selective pressures of various domestic animal species at the genomic level and to identify candidate genes associated with economic traits in order to better understand the mechanisms of adaptive evolution. For example, several important genes relevant to reproduction and growth such as GHR and MC1R have been identified in cattle [4][5][6][7][8], and Flori et al. [6] implemented a network analysis for detected genes that have been putatively subjected to selection. Akey et al. [9] identified 155 regions in the canine genome that have likely been subjected to strong artificial selection, including the HAS2 gene, which is involved in skin wrinkling. The thyroid stimulating hormone receptor (TSHR) gene was identified as having undergone strong artificial selection in domestic chickens [10]. The above studies used several types of approaches that were based on either the allele frequency spectrum or the properties of haplotype segregation in populations to detect signals of recent positive selection on a genome-wide scale [8]. For example, Fst (a measure of population differentiation) provides an estimate of the genetic variability between populations: a locus that shows significantly high Fst statistics compared with other loci provides evidence for positive selection [11]. Akey et al. [12] suggested that the loci in the tails of the empirical distribution of Fst be used as candidate targets of selection. Another method of identifying loci under selection is the EHH (Extended Haplotype Homozygosity) test [13], which identifies the genome regions that have unusually high LD and allele frequency.
The advent of the Illumina Porcine SNP60 BeadChip [14] allows for the investigation of selective pressure at the genome-wide level in pigs. Melanocortin receptor 1 gene (MC1R) was identified as an artificial selection gene related to coat colour in Chinese domestic pigs [15]. A missense mutation in the PPARD gene had an effect on the ear size of the pigs [16]. China has a number of indigenous pig breeds, most of which are fat-type and low degree of nurturing breeds. Therefore, using Chinese indigenous breeds would be a better way to obtain meaningful signatures of selection on genes implying for economic traits in the pig at genomic level. Therefore, the objective of this study was to identify regions subjected to recent artificial selection using a genome scan for SNP differences. The findings will contribute to the construction of a positive selection map, which could help us to understand the recent breeding history of different pig breeds. Our results will also facilitate the identification of candidate genes that are important for economic traits for breeding practices.

Population structure and genome-wide distribution of Fst
To examine the genetic structure of the studied populations, the principle component analysis (PCA) was conducted based on all available SNP information. As shown in Figure 1, the first two components accounted for 42.43% and 8.94% of the variation, respectively. The Luchuan, Bama and Wuzhishan pigs were clustered closely, as were the Ningxiang and Tongcheng pigs and the Large White and Landrace pigs, while the Yutai and Laiwu were more distant from the other pig breeds.
We constructed the empirical genome-wide distribution of global Fst estimates based on 44,652 SNPs of the nine breeds (ALLPOP) in order to examine the interlocus variation in allele frequencies ( Figure 2). The average Fst of these loci was 0.3717 with standard deviation 0.16. Local environmental adaptation and artificial selection can change the allele frequencies of specific loci: the frequency of advantageous alleles at the selected loci will increase, leading to a higher than expected level of population differentiation (Fst) [12]. The genome-wide distribution of Fst revealed selection in the pig genome. To identify specific genomic regions containing signatures of selection, we constructed a chromosomal distribution of Fst as a function of chromosome position. As shown in Figure 3, the sex chromosomes have a smaller effective population size compared to the autosomes, which makes them more sensitive to demographic events and/or natural selection [12]. As a result, there was an unexpectedly high Fst level on the physical position 40-80 M of the X chromosome. Taking into account the PCA analysis results, the pairwise Fst between Chinese indigenous and European commercial breeds were calculated by merging Chinese indigenous breeds and commercial breeds into two groups (CHN VS EURO). In addition, the pairwise Fst between Northern (LW pigs) and Southern Chinese indigenous breeds was calculated by merging LC, WZS and BM into one group (Northern VS Southern).

Candidate genes under selection
To identify loci subjected to selection, we focused on the high-Fst outlier method corresponding to the distribution of Fst. According to the empirical distribution of Fst estimates, we selected the high-Fst outlier SNPs that corresponded to the upper 1% of the distribution as the loci under selection. In the ALLPOP group, a total of 446 SNPs were determined to be subjected to natural or artificial selection following this criteria, and these SNPs were from a total of 81 candidate genes (Additional file 1: Table S1). In addition, a total of 84 and 79 candidate genes were identified in the CHN VS EURO group and the Northern VS Southern group, respectively (Additional file 2: Table S2 and Additional file 3: Table  S3). Several candidate genes contain contiguous outlier-Fst SNPs; for example, the transient receptor potential three (TRPM3) gene contains five contiguous SNPs with Fst values that are consistently high, and the nuclear envelope spectrin repeat 2 (Nesprin-2) gene contained two outlier-Fst SNPs in the ALLPOP group.

Functional analysis of candidate genes under selection
Based on a system biology approach, we carried out network analysis using IPA software to identify the critical physiological pathways of the genes harbouring footprints of positive selection. The pig breeds selected have obvious differences in both morphology and performance. The Large White and Landrace pigs are wellknown commercial breeds with high meat productivity, fast growth, and high adaptability; however, Chinese indigenous breeds vary in morphological and performance phenotypes and in local environmental suitability. For example, the Bama and Wuzhishan pigs from Southern China have a small body size, while the Laiwu pigs from Northern China are larger. First, 75 out of 81 genes in the ALLPOP group were mapped to the IPA database, and then three significance networks, namely N1, N2 and N3, were constructed. N2 and N3 were interconnected and further merged into a single network (N). Networks N and N1 are represented in Figures 4 and 5, respectively. The main hubs of the N network contained genes encoding protein kinases (Akt, Erk, Mapk, JAK2, PKC), transcription factors (NFκB, FOS), and several other signalling molecules (Insulin, CDKN1B, NR3C1,  Figure S1) and ESR1,PKC and insulin in Northern VS Southern group (Additional file 5: Figure S2).

Discussion
In this study, the population structure of the nine pig breeds was analyzed, and the PCA results showed that most of the individuals could be classified into their breeds using the first and second eigenvectors ( Figure 1). As with other livestock species such as cattle and sheep [17,18], the combination of PC1 and PC2 separated individuals according to their geographic origin: of all the studied breeds, the indigenous breeds of Southern China (Wuzhishan, Bama, Luchuan) clustered together, as did the breeds of Central China (Nixiang, Tongcheng), the Northern Chinese breed (Laiwu) and a developed breed (Yutai) formed a separate single cluster, and two commercial breeds, the Large White and Landrace, formed a distinct cluster. There was almost no overlap between the nine different pig breeds. This opens the possibility that an informative SNP panel can be used to assign parentage, which has proven successful in cattle [19].
Pigs have been undergoing selection to enhance performance and productivity during domestication and breed formation. In the present study, global and pairwise Fst was utilized to detect genetic selection in Chinese indigenous and commercial pig breeds. First, the ALLPOP group showed evidence of selection on chromosomes 8 (Figure 3). We identified selection near KIT, which can affect coat colour in pigs when mutated [20] and also shows high evidence of selection in sheep [18]. In addition, as shown in Figures 4 and 5, the N network contained several hubs involved with physiological signaling molecules (NFκB, MAPK, ERK). These data indicate that these genes participate in the basic physiological processes, and the N network contained hubs (TNF and beta-estradiol) showing that the genes under selection are involved in the immune response and reproductive traits. The POU3F4 and OTX2 genes are important for the development of cochlea, and mutants of these two genes in mice cause developmental defects in the inner ear [21,22]. In mouse embryonic stem cells, the mutant zinc-finger proto-oncogene GFI1B gene decreases erythropoiesis of embryonic stem cells [23]. The PAX6 gene is necessary and sufficient to trigger the cascade of events required for eye formation [24]. The PAX6 and OTX2 genes also play important role in the development of the body axis [25,26]. In addition, several identified molecules are involved in the development of organs. The GNAQ gene can regulate cardiac growth and development, and mice lacking both GNAQ and GNA11 [Gaq(−/−); Ga11(−/−)] died at embryonic day 11 due to cardiomyocyte hypoplasia [27]. The TFAP2A gene is a critical transcription factor for epidermal differentiation and interacts with notch signaling molecules [28].
Several candidate genes are also involved in molecular transportation in the ALLPOP group. For example, the SLOC1A2 and SERPINA7 genes can increase the transport of thyroid hormone in the serum [29,30]. The SLC16A1 gene plays an important role in the transport of mevalonate and ketone bodies [31,32]. Among the candidate genes under selection, some are associated with genetic disorders and cancer in humans. GWAS results showed that a SNP substitution mutation of BBS9 was associated with amyotrophic lateral sclerosis [33]; furthermore, BANF2, SNX25, SAMD12 and GPR177 were associated with Crohn's disease [34], and inflammatory bowel disease was associated with the upregulation of human CD274 at the cell surface from macrophagederived dendritic cells of the inflamed colon [35]. In addition, three genes (DIAPH2, AFF2, POF1B) were involved in functions related to premature ovarian failure [36,37]. In the CHN VS EURO group, the network hubs are gathered at the centre with the IGF1R (Insulin-like growth factor 1 receptor) gene (Additional file 4: Figure  S1), which is necessary for normal growth. IGF1R null mice die at birth of respiratory failure and exhibit only 45% of the body weight of their wild-type littermates [38]. European commercial pig breeds grow faster in contrast with Chinese indigenous breeds. In addition, the IGF1R gene also showed a strong signature of selection in European domestic pigs [39]. Interestingly, one of the most critical signalling molecules, JAK2, showed high evidence of positive selection both in the ALLPOP and the CHN VS EURO groups (Figure 4 and Additional file 4: Figure S1). JAK2 is an essential gene in mammals and participates in a variety of biological processes; the loss of JAK2 is lethal [40]. JAK2 is also involved in the immune response [41], and it has been suggested that these pig breeds may have different resistance to pathogens. In the Northern VS Southern group, the central hub of the network was the ESR1 (esotrogen receptor 1) gene (Additional file 5: Figure S2). The ESR1 gene was associated with litter size in pigs and was also a candidate gene for boar fertility and sperm quality [42][43][44]. Laiwu pigs have a higher reproductive capacity compared with the Bama and Wuzhishan pigs. Correspondingly, the ESR1 gene showed high evidence of positive selection in our study.
The high-Fst outlier method is a powerful tool for the detection of positive selection [45]; however, the high correlation between Fst estimates when loci are in strong disequilibrium makes it difficult to determine whether the Fst at particular SNP is markedly different from the expected values [46]. We also tested the correlation of Fst between pairs of SNPs as a function of marker distances; the correlation of Fst tended to drop quickly toward 0 when SNPs were more than 300 kb apart (data not shown). Modern pig breeds had much larger average linkage disequilibrium (LD) than humans and cattle [47], therefore, the results in pigs were greater than in humans and bovines [6,12].

Conclusions
Overall, a genome-wide scan was performed in Chinese indigenous pigs to help interpret artificial selection and adaptive evolution. We constructed population structures and genome-wide distributions of Fst. A number of genes were identified as displaying signatures of selection, and several critical physiological pathways of these genes were determined to have footprints of positive selection. Some of these genes play important roles in biological processes, which can be used to interpret the differences between these pig breeds. Genotyping was carried out using the Illumina Porcine SNP60 BeadChip [14], which contains a total of 62,123 SNPs. Quality control was determined using the PLINK programme [48]. A total of 8,383 unmapped markers (Based on Sus Scrofa Build 9.0) and 8,391 loci were filtered to exclude markers with a minor allele frequency (MAF) < 0.05. A total of 2,709 markers that were genotyped on less than 90% of all individuals were discarded from further analysis. The final data set consisted of 44,652 SNPs from nine breeds.

Population structure and Fst estimation
Principal component analysis (PCA) based on all available SNP information was performed using the SVS7 software (Golden Helix Inc., Bozeman, MT,USA). Fst statistics across populations were estimated using the Genepop 4.1 program [49]. Fst is a measure of population differentiation, which is defined as Fst ¼ ÞMSIþn c MSG , where MSG, MSI and MSP represent the mean sums of squares for gametes, individuals and populations computed by an analysis of variance, respectively, and n c = (S 1 − S 2 /S 1 )/(n − 1), where S 1 is the total sample size, S 2 is the sum of squared group sizes, and n is the number of non-empty groups.

Identification of candidate genes under selection
Genome regions containing the high-Fst outliers corresponding to the distribution of Fst were identified as follows: for all loci, a region was considered to be a high-Fst outlier if it corresponded to the upper 1% of the empirical genome-wide distribution of Fst. A gene was regarded as being under selection if it contained unexpectedly highly differentiated SNPs among the populations. All of the high-Fst outlier loci were mapped to gene-associated regions based on the pig genome annotation (Sus Scrofa Build 9.0 version). An SNP was considered to be from a particular gene if it mapped to either the 5′ upstream, 5′ UTR, coding, intronic, 3′UTR, or 3′ downstream region of the gene.

Network analysis of candidate genes
Network analysis was aimed at searching for the direct or indirect interactions between candidate molecules and the related property. The known interactions were annotated by experts according to the literature. Ingenuity Pathway Analysis (IPA) v7.0 (Ingenuity Systems Inc., USA, http://www.ingenuity. com/) was used to construct networks. We uploaded the genes being subjected to selection into this software and organized them into networks of interacting genes to identify several pathways containing important functionally related genes. This network analysis approach was similar to the described by Flori [6]. The genes that displayed evidence of selection were uploaded into IPA based on the eligible candidate genes, and IPA automatically constructed several networks that contained a limit of 70 molecules (including candidate genes).