Edinburgh Research Explorer Whole Genome Scan Reveals the Genetic Signature of African Ankole Cattle Breed and Potential for Higher Quality Beef Whole Genome Scan Reveals the Genetic Signature of African Ankole Cattle Breed and Potential for Higher Quality Beef

Whole genome scan reveals the genetic signature of African Ankole cattle breed and potential for higher quality beef' BMC Genetics, vol 18, no. 1, pp. 11. use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver General rights Copyright for the publications made accessible via the Edinburgh Research Explorer is retained by the author(s) and / or other copyright owners and it is a condition of accessing these publications that users recognise and abide by the legal requirements associated with these rights. Take down policy The University of Edinburgh has made every reasonable effort to ensure that Edinburgh Research Explorer content complies with UK legislation. If you believe that the public display of this file breaches copyright please contact openaccess@ed.ac.uk providing details, and we will remove access to the work immediately and investigate your claim. Abstract Background: Africa is home to numerous cattle breeds whose diversity has been shaped by subtle combinations of human and natural selection. African Sanga cattle are an intermediate type of cattle resulting from interbreeding between Bos taurus and Bos indicus subspecies. Recently, research has asserted the potential of Sanga breeds for commercial beef production with better meat quality as compared to Bos indicus breeds. Here, we identified meat quality related gene regions that are positively selected in Ankole (Sanga) cattle breeds as compared to indicus (Boran, Ogaden, and Kenana) breeds using cross-population (XP-EHH and XP-CLR) statistical methods.


Background
Africa, with its diverse agro-ecological zones, is a home to diverse cattle breeds adapted to their local environments. African cattle breeds are derived from Bos taurus and Bos indicus subspecies introduced to the continent at different times, and through interbreeding between them [1,2]. Since the introduction, their diversity has been shaped by subtle combinations of human and natural selection. Selection in African cattle is mainly for sociocultural concerns and to survive the heterogeneous environment [3]. African cattle have been evolved to adapt to the poor feed availability, high environmental temperature, and high prevalence of internal and external parasite and disease conditions of the continent. These cattle breeds display better heat tolerance, adaptability, tick resistance, reproductive longevity, and maternal characteristics such as fertility, low inter-calf periods and cow efficiency [1,[4][5][6][7].
African Sanga cattle, sometimes referred to as Bos africanus, are an intermediate type of cattle believed to be the result of interbreeding between Bos taurus and Bos indicus, which dwell in eastern, central and southern Africa [1,8]. Generally, Sanga cattle can be identified by their long and slender horns, small cervicothoracic hump, and small and unfolded dewlap [9]. There are 30 Sanga cattle breeds/strains in Africa subdivided into Sanga of eastern and Sanga of southern Africa based on geographical location [8]. Recently, research outputs are asserting the potential of African Sanga and Sanga derived breeds to produce carcass and meat quality attributes that favorably compare to British and Continental breeds and are often better than those of the Bos indicus breeds [6,[10][11][12][13]. Sanga breeds in south Africa (eg., Bonsmara, Drakensberger and Nguni) were found to produce beef with lower shear force, shorter myofibrillar fragment length, larger rib fat thickness, and larger soluble collagen when compared with indicus (Brahman) cattle [11].
Meat quality is a general term used to describe the attributes of meat which include carcass composition and conformation, the eating quality of meat, health issues associated with meat, and production and environmental issues [14]. Meat sensory characteristics such as tenderness, flavor, juiciness, and color are important meat quality parameters which are affected by biological characteristics and proteolytic activities of muscle [15,16]. The biological characteristics of muscles such as fiber type, collagen, intramuscular adipose tissue and protease activities regulate meat tenderness and flavor and are known to be affected by genetic and rearing factors [15]. The heritability of beef quality traits is low to moderate which varies between breed groups, methods of estimation, number of records, and other factors [17,18]. The genetic variation within and between breeds is because of the positive selection of gene regions caused by beneficial polymorphisms in the genes affecting the traits. Identification of selection signatures in the genome provide information about the evolutionary processes involved in shaping genomes and functional information about genes/ genomic regions [19].
Studies attempting to detect positive selection signatures in African cattle have reported several genes involved in immune system, reproduction, energy metabolism, coat coloration, thermoregulation and tick resistance [20,21]. The detection of immune related genes might be related to the selective pressure that has been exerted by the long-term presence of pathogens in the continent [21], whereas signatures of selection associated with reproduction and thermoregulation is an adaptation to perform under heat stress conditions [20]. However, there have been no previous studies attempting to identify genes affecting meat quality traits in African cattle in general and Sanga cattle in particular.
In this study, we reported genes that are positively selected in Ankole cattle population associated with meat quality traits. This was done by scanning the whole genome of four African cattle breeds (African Sanga cattle: Ankole; and three indicus breeds: Boran, Ogaden, and Kenana) [22,23]. We employed cross-population extended haplotype homozygosity (XP-EHH) and cross population composite likelihood ratio (XP-CLR) statistics in order to detect selection signatures from different data patterns; two approaches were used as each has its own advantages. XP-EHH compares haplotype lengths of populations to detect selective sweeps when the allele has approached or achieved fixation in one population but remains polymorphic in the other population [24]. XP-CLR is a statistic based on allele frequency differentiation across populations. It is not affected by ascertainment biases and has the advantage of being able to detect older signals and selection on standing variation [25].

Results and discussion
Data description DNA samples extracted from whole blood samples of four African cattle breeds (Boran, Ogaden, Kenana and Ankole) were sequenced to~11 × genome coverage each. Using a standard sample preparation and whole genome re-sequencing pipeline, an overall alignment rate of 98.84% covering 98.56% of the taurine reference genome was obtained. After filtering false positive calls using several filtering steps, a total of~37 million SNPs were retained and used for detection of positive selection signature analysis.

Phylogenetic tree
Maximum likelihood (ML) and neighbor-joining (NJ) methods produced consistent features regarding the genetic distance between the breeds considered ( Fig. 1). Ankole cattle are clearly separated from the three indicus breeds (100% bootstrap values/quartet puzzling reliability values). Within indicus cattle, each of the three breeds were also depicted as a monophyletic group with highly significant values.
While African indicus cattle have been present on the continent since 1500 BC [8], Sanga cattle were derived through hybridization between taurine cattle and zebu cattle around 700 AD [1]. More than 30 breeds/strains of Sanga cattle can be found distributed throughout Eritrea, Ethiopia, southern Sudan, the Great Lakes region of East Africa and southern Africa. The Ankole group is one of the three groups of Sanga cattle representing Sanga cattle in Uganda, Rwanda, Burundi, Tanzania and Democratic Republic of Congo [8]; it is a valuable and widely used genetic resource in these regions.
Positive selective signature in Ankole cattle population XP-EHH and XP-CLR tests were performed in order to detect positive selection signatures in Sanga (Ankole) cattle. The genome of Ankole population was compared with the genomes of three indicus cattle breeds grouped together into one population. Based on the analysis, we obtained 238 and 213 putatively advantageous positively selected genes from XP-EHH (Additional file 1: Table S1) and XP-CLR test statistics (Additional file 2: Table S2), respectively; of these, 98 genes were detected in both statistics (Additional file 3: Table S3). Gene Ontology Biological Processes (GO BP) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways within DAVID were used to build on biological modules consisting of clusters of functional terms [26]. All the 354 genes obtained from both XP-EHH and XP-CLR statistics were included, after removing duplicates, for the analysis. Gene ontology analysis resulted in 44 significantly (p < 0.05)  Table S4) and the KEGG-pathway analysis resulted in three significantly enriched pathways (p < 0.05; Table 1). The ClueGO plugin [27] created a functionally organized pathway term networks (Fig. 3).
We also analyzed Tajima's D for the candidate gene regions which revealed a significant departure from neutrality and indicated the selective maintenance of alleles within the Ankole population as compared to its indicus counterparts ( Table 2). The negative Tajima's D values obtained for the candidate gene regions indicate the presence of an excess of rare alleles in the population. It is known that low-frequency alleles contribute less to the number of pair-wise differences in a sample set than alleles of moderate frequency do; a surplus of rare alleles inflates the latter value disproportionately to the former value [28]. Similarly, population differentiation analysis supported the positive selection of candidate genes ( Table 2); candidate gene regions produced higher values of fixation index [29]. F ST has been widely used to identify selective sweep regions in different livestock species [30]. The Tajima's D and F ST plot of candidate gene regions are presented in Fig. 4 and Additional file 5: Figure S1.

Biological process and pathways related to meat quality traits
Meat quality is a multifactorial and complex trait affected by different factors at different levels ranging from molecular to mechanical. Molecularly, genes involved in many cellular mechanisms such as muscle growth, glycolysis, muscle contraction, stress reaction, cell cycle, proteolysis, protein ubiquitination and apoptosis have been reported to be associated with meat quality characteristics [16,31,32]. Previous studies reported that, as compared with indicus breeds, Sanga breeds produce better quality beef [11][12][13] with lower shear force, shorter myofibrillar fragment length, larger rib fat thickness, larger soluble collagen, and higher percent drip loss [11]. Additionally, Sanga cattle have better feed conversion efficiency, reproductive performances, and tick resistance in the tropics [33].
From DAVID gene ontology analysis, 44 significant (p < 0.05) GO BP terms were enriched ( Fig. 2; Additional file 4: Table S4). The BP terms and gene clusters related to meat quality characteristics were chosen based on their biological function and previous literature. Accordingly, among the enriched GO BP terms (Fig. 2), actin cytoskeleton organization (represented by nine genes; FMNL1, FMNL3, DOCK2, LIMA1, ROCK1, MRAS, PRKG1, CAPZB, and ADD1) and actin filament-based process (additionally contains MYO7A) are related to meat tenderness [32,34,35]. Cellular component organization, a cellular level process which results in the assembly and arrangement of constituent parts or disassembly of a cellular component, is important for beef tenderness [32]. It is also significantly differentially expressed in relation to pork IMF and tenderness [36]. Five genes (WWP1, MDM2, CAND1, PARK2, and LNX1) were involved in protein ubiquitination, which is a key step in protein degradation [37]. Ubiquitination pathway affects muscle properties that are relevant for the quality of meat at postmortem [38], and are expressed in relation to tenderness [36]. GO terms of negative regulation of actin filament depolymerization and negative regulation of protein complex disassembly are involved in adipocyte regulation [34].
The KEGG MAPK pathway (p = 0.0215; Table 1), represented by eight genes (MAP4K4, ACVR1B, FGF18, MAP3K5, MAP2K3, MRAS, PDGFRA, PLA2G2A, NR4A1, MAPKAPK2, and NFATC2) is responsible for cell proliferation and plays an important role in hyperplastic growth [39], which has a positive effect on meat tenderness [31]. Gap junction, regulation of actin cytoskeleton and MAPK signaling pathway also are important in residual feed intake [40]. The ClueGO plugin created a functionally organized pathway term network (Fig. 3), that the networks actin filament bundle assembly and positive regulation of proteolysis were among enriched networks in relation to meat quality characteristics [32].

Genes affecting meat quality traits in Ankole cattle
Here, we described genes positively selected in Ankole Sanga cattle that are potentially associated with meat quality and feed conversion efficiency traits based on previous studies and their biological functions (Table 2; Additional file 6: Table S5).

Genes related to meat tenderness
Meat tenderness is an important meat eating quality trait. It is mainly affected by the quantity and solubility of connective tissue, composition and contractile state of muscle fibers, and the extent of proteolysis in rigor muscle [11,31,41]. Tender meat contains higher levels of soluble

3.85
We used 354 genes obtained from both XP-EHH and XP-CLR statistics after removing duplicate genes collagen, more fat, and lower water content. Myofibril fragmentation index also has a positive correlation with beef loin tenderness [42]. Sanga breeds have a lower percentage of white muscle fiber and a higher myofibrillar fragmentation index [6,11], which results in lower shear force and more tender beef compared to indicus cattle [11]. In this study, we have identified genes (CAPZB, COL9A2, PDGFRA, MAP3K5, ZNF410, LIMA1, and PKM2) that may potentially affect muscle structure and development thereby affecting meat tenderness in Ankole cattle.
The CAPZB (XP-CLR = 142.50) gene encodes the beta subunit of the barbed-end actin binding protein, which belongs to the F-actin capping protein family. It is involved in skeletal muscle development and growth [43], and cell signaling and regulation of actin in myofilament contractility [38]. When up-regulated, it increases the ability of muscle accretion in pigs [43]. CAPZB contributes to muscle metabolic and structural properties and proteolytic processes providing a link between these functional networks which are important for maturation of muscle to meat [38]. A previous functional analysis of meat tenderness revealed a positive correlation between CAPZB expression and beef tenderness [32]. In the pig, CAPZB is an essential element for protein kinase signaling to the myofilaments and, as a structural protein, it has been shown to influence muscle biochemistry and its postmortem abundance is related to meat quality [44]. The Tajima's D and F ST plot of the CAPZB gene region (Fig. 4a) shows the presence of an excess of rare alleles in Ankole population and the differentiation of the region between the compared breeds, respectively. LIMA1 (called EPLIN) encodes a cytoskeleton-associated protein that inhibits actin filament depolymerization and cross-links filaments in bundles. It is associated in pigs with functions regarding muscle development and metabolism [45]. ZNF410, also known as APA-1, is an essential component of the stress pathway involved in the meat tenderization process [32]. In previous muscle transcriptome analyses, ZNF410 has been shown to be highly expressed in the longissimus muscle of Basque pigs that are known to produce pork with higher intramuscular fat and tenderness compared to Large White pigs [35]. COL9A2, a fibrillar collagen, constitutes the largest component of extracellular matrix (ECM) to which its amount, type, and solubility present in muscle tissue have a strong effect on meat tenderness [39]. This gene was found to be upregulated in the longissimus dorsi muscle of Jeju native piglets [46], whose meat is known for its preferable taste, tenderness and superior marbling [47]. ROCK1, a gene that regulates actin cytoskeleton and cell polarity, is associated with body weight, carcass weight, shank length, shank circumference and other carcass weight traits in chicken [48].
Genes involved in MAPK signaling (MAP3K5, MAP2K3, MAP4K4, and MAPKAPK2) were also identified. MAPK signaling is one of the major intracellular signaling pathways affecting myogenesis [49] and is relevant to postmortem meat quality [38]. MAP2K3 shows associations with loin muscle area and fat traits in pigs, implying roles in muscle differentiation and growth [49].
E3 ubiquitin ligase genes (WWP1, and PARK2) play an important role in the regulation of a wide variety of cellular functions such as protein degradation, transcription, and RNA splicing. These genes catalyze protein Fig. 2 Functional clustering of GO BP terms annotated from DAVID gene ontology analysis. All the 44 significantly (p < 0.05) enriched BP terms were used for the functional clustering. Clusters related to meat quality characteristics are highlighted in red color ubiquitylation resulting in the targeting of proteins toward various cellular fates, with proteasome-mediated proteolytic degradation [50]. The ubiquitin-proteasome system is one of the proteolytic systems responsible for the majority of the protein degradation in muscle that is relevant for meat quality postmortem [51].
The expression of PKM2 (XP-EHH = 1.9696; p =2.00.E-03), a gene involved in energy metabolism, is positively correlated with WBSF and has been reported as a functional protein marker for meat tenderness in Thai indigenous chicken [52] and beef [32]. PDGFRA also has an effect on shear force and Loin Eye Area in pig [53].
The Tajima's D and F ST plot of PKM2 and PDGFRA gene regions is shown in Fig. 4b and c, respectively. PRKG1 is reported to be important in the conversion of muscle to meat [54]. NFATC2 is a calcineurin substrate expressed in skeletal muscle which is responsible for activating new myotubes [39]. Calcineurin is crucial for myocyte differentiation and determination of the slow oxidative fibre phenotype [55].

Genes related to meat intramuscular fat (IMF)
IMF is a heritable meat quality trait which affects flavor, juiciness, visual characteristics and meat tenderness. It is Fig. 3 ClueGO gene ontology analysis of 354 positively selected genes in Ankole cattle population. ClueGO visualizes the selected terms in a functionally grouped annotation network that reflects the relationships between the terms based on the similarity of their associated genes. Nodes represent gene ontology terms to which their size reflects the statistical significance of the terms. The most prominent gene ontology term for each group is highlighted in colors, and the circled gene ontology terms are related to meat quality characteristics positively correlated with body fat and red muscle fiber [41]. Steak from Ankole cattle has been found to be juicy than those Ankole-Boran crossbreds [13], and Strydom et al. showed higher levels of rib fat thickness in Sanga as compared to indicus cattle [10].
We identified several genes (PLA2G2A, PARK2, ZNF410, PKM2, MAP2K3, PLCD3, PLCD1, ROCK1, and AHSA1) which affect the fat content of meat in Ankole cattle. PLA2G2A (XP-CLR = 117.48) is a member of the phospholipase A2 family (PLA2), which is involved in the hydrolysis of phospholipids into fatty acids and phosphatidylinositol and phospholipid metabolism [56]. Also referred to as Adipose-Specific Phospholipase A2 (AdPLA), it is involved in adipocyte metabolism and catalyzes the efficient release of free fatty acids and lysophospholipid from phosphatidylcholine [57]. It has been reported in the literature that PLA2 has a positive effect on porcine fat deposition (IMF) and potentially regulates lipolysis and increases the MUFA deposition rather than the SFA deposition [58]. It is also associated with intramuscular fat in beef cattle [59]. Pla2g2a has been reported to be a candidate gene in relation to obesity in mice [60].
ROCK1 is involved in pathways relevant to muscle/adipose tissue function in pigs with divergent phenotypes for fatness traits [61]. E3 ubiquitin ligase enzymes have been identified to be involved in the modulation of lipid biology [50,62]. PARK2 is a strong positional candidate for adiposity in chicken and a positive regulator of fat metabolism [62]. PRKG1 is involved in gap junction and is a candidate gene for intramuscular fat in the pigs [36]. MAP2K3 has been shown to be associated with loin muscle and fat traits in pigs [49]. MAP4K4 is involved in adipogenesis, triglyceride storage, fatty acid release, fatty acid oxidation and mitochondrial oxidative phosphorylation [63]. PKM2 is significantly associated with back fat thickness, an economically important trait in pigs [64]. APOL6 is one of the most important known genes involved in lipoprotein metabolism [65]. Phospholipase C family genes (PLCD1 and PLCD3) generate diacylglycerol and are involved in phosphatidylinositol catabolism and phospholipid synthesis [56]. The transcription of AHSA1 (AHA1) is related to Omega-3 fatty acids in skeletal muscle, which influence meat tenderness, juiciness, and flavor, and are beneficial to human health [66]. The positive selection of the AHSA1 gene region is shown in the Tajima's D and F ST plot in Fig. 4d.

Genes related to meat color, drip loss, and feed conversion efficiency (FCR)
Meat color and water holding capacity of meat are among the quality parameters used as an indicator of freshness and wholesomeness [41,67]. These characteristics are related to variations in the glycolysis rate and muscle temperature decline postmortem. Myoglobin (MB; XP-CLR = 129.86; XP-EHH = 1.9640; p = 5.00.E-03), a globular single chain protein located in the sarcoplasm, is the principle protein responsible for the red color of meat. MB serves as a reserve supply of oxygen and facilitates the movement of oxygen within muscles [41,67]. Figure 4e shows the Tajima's D and F ST plot of MB gene region in Sanga and B. indicus populations. The Solute Carrier Family 48 (Heme Transporter), Member 1 (SLC48A1) is responsible for the transport of heme from endosome to the cytosol [68] and may also have a function in meat color. In general, beef from Sanga cattle breeds showed higher chroma than that of indicus cattle breeds [11].
The loss of reddish fluid mainly consisting of water and proteins from meat, called drip loss, is an important meat quality characteristics which is affected by several ante-and post-mortem factors [69]. A small but significant difference in drip loss is reported between Sanga and indicus cattle breeds; meat from Sanga cattle showed higher drip loss [10,11]. Higher expression of PKM2 and MAP4K4 suppresses the glucose content of muscle cells promoting the onset of anaerobic production of lactate post mortem, thereby facilitating the decline in pH resulting in higher drip loss [70,71].
Feed intake and efficiency, measured as residual feed intake (RFI), are economically important traits affecting the cost of beef production [72]. Variation in RFI (animals with lower RFI are more efficient) has a genetic component with moderate heritability [73]. We identified positively selected genes (TIMP2, PKM2, PRKG1, MAP3K5, and ATP8A1) that are reported in the literature to be related to RFI and feed conversion efficiency. TIMP2 has been shown to be upregulated in low RFI animals in gene expression profiling studies on genes expressed differentially in cattle with high and low RFI [73]. PKM2 was associated with average daily gain, and feed to body weight gain ratio, with a significant additive and/or dominance effects on these traits [74]. PRKG1 is involved in gap junction and is also a candidate gene for RFI in cattle [72]. MAP3K5, also known as apoptosis signal-regulating kinase 1 (ASK1), is a candidate gene for residual feed intake in pigs [75]. ATP8A1 is also related to feed intake, feed conversion ratio, residual feed intake and weight gain [76]. Olfactory receptor genes (OR2D2, OR10A4, and OR2D3) have been shown to affect the perception of taste and smell [77,78] and therefore can be related to feed intake and feeding behavior [77]. PIK3CB, and MRAS genes involved in the Akt/PI3K and MAPK signaling pathways, respectively, are important for high feed efficiency in chicken [79]. The positive selection of these genes may provide clues as to why Ankole cattle are able to use and survive on poor quality feed and withstand severe droughts [80].

Implication of the results of this study on Ankole population
The Ankole group is one of the three groups of Sanga cattle representing Sanga cattle in east and central Africa [8]. Ankole breed is a valuable and widely used genetic resource in the region due to its better adaptability. However, there have been no well-designed breed improvement programs for Ankole and other Sanga breeds of eastern Africa [80,81]. Selective breeding efforts in other South African Sanga cattle breeds (e.g., Mashona, Tuli, and Afrikander) have resulted in local cattle showing higher beef productivity [82]. As cattle genetic resources are being depleted [1,3] and given the importance of this vital genetic resource, designing breeding programs that would help improve and conserve Ankole cattle is crucial [81]. With this regard, the results provide a basis for further research on the genomic characteristics of Ankole cattle in relation to meat quality traits.

Limitations of the present study
As is typical in this kind of study, there is a possibility of obtaining false positive results. Therefore, validation with other methods such as GWAS, candidate gene approach and gene expression analysis are suggested. In addition, given the multifactorial nature of meat quality traits, limited published literature is available on genes affecting beef quality characteristics.

Conclusions
Results from the whole genome scan revealed several positively selected genes involved in different biological and cellular functions including those affecting meat quality characteristics. The genes identified in relation to meat quality characteristics are involved in muscle and lipid metabolism that affect tenderness and intramuscular fat content of meat; and help to improve our understanding of the biological mechanisms controlling meat quality traits in beef cattle production. These results provide a basis for further research on the genomic characteristics of Ankole and other Sanga cattle breeds for quality beef production.

Sample preparation and whole genome re-sequencing
The data used for this paper was obtained from a project: "The genome landscape of indigenous African cattle" [Kim et al., accepted]. DNA extracted from whole blood samples (10 ml) taken from four African cattle breeds (ten Ankole, nine Boran, nine Ogaden and ten Kenana) was used for this analysis. G-DEXTMIIb Genomic DNA Extraction Kit (iNtRoN Biotechnology, Seoul, Korea) was used to isolate DNA according to the manufacturer's protocol. To generate inserts of~300 bp, 3 μg of genomic DNA was randomly sheared using Covaris System. Using the TruSeq DNA Sample Prep. Kit (Illumina, San Diego, CA), we constructed the library following the manufacturer's guidelines and whole genome sequencing was performed using the Illumina HiSeq 2000 platform.
To check the quality of the raw sequence data, we used fastQC software [83]. Pair-end sequence reads were mapped to the reference bovine genome (UMD 3.1) using Bowtie2 [84] with default parameters except the "-no-mixed" option. The overall alignment rate of reads to the reference sequence was 98.50% with an average read depth of 10.8×. On average across the whole samples, the reads covered 98.51% of the genome.
We used open source software packages of Picard tools (http://broadinstitute.github.io/picard), SAMtools [85], and Genome Analysis ToolKit 1.4 (GATK) [86] for downstream processing and variant calling. Picard tools was used to filter potential PCR duplicates. SAMtools was used to create index files for reference and bam files. Genome analysis toolkit 1.4 performed local realignment of reads to correct misalignments due to the presence of indels ("RealignerTargetCreator" and "IndelRealigner" arguments). We used the "UnifiedGenotyper" and "SelectVariants" arguments of GATK to call candidate SNPs. To filter variants and avoid possible false positives, the "VariantFiltration" argument of the same software was adopted with the following options: 1) SNPs with a phred-scaled quality score of less than 30 were filtered; 2) SNPs with MQ0 (mapping quality zero; total count across all samples of mapping quality zero reads) > 4 and quality depth (unfiltered depth of non-reference samples; low scores are indicative of false positives and artifacts) < 5 were filtered; and 3) SNPs with FS (Phred-scaled P-value using Fisher's exact test) > 200 were filtered since FS represents variation on either the forward or the reverse strand, which is indicative of false positive calls. BEAGLE [87] was used to infer the haplotype phase and impute missing alleles for the entire set of cattle populations simultaneously. After all the filtering processes, a total of~37 million SNPs were retained and used for further analysis.

Phylogenetic reconstruction
To understand the genetic distance between the breeds considered, we conducted phylogenomic analyses using neighbor-joining (NJ) and maximum likelihood (ML) methods. A total of 26,427,196 autosomal SNPs from the genomes of 38 individuals of four breeds were used for the phylogenic tree construction.
ML analyses [88] were performed using the program TREE-PUZZLE 5.2 [89] with the GTR model. For the quartet puzzling method (1000 puzzling steps), nucleotide frequencies and Ts/Tv ratios (3.18) were estimated from the dataset. Quartet puzzling provided reliability values for maximum likelihood analysis [90].
NJ analysis [91] was performed using the PHYLIP package 3.69 [92] based on Kimura's [93] 2-parameter distance. Ts/Tv ratios (3.18) were estimated from the dataset using TREE-PUZZLE 5.2 [89] and were used as inputs for the SEQBOOT, DNADIST, NEIGHBOUR, and CONSENS programs of the PHYLIP package. A bootstrap test (with 1000 pseudoreplicates) [94] was performed to obtain statistical support for each node of the NJ tree.

Detection of positive selection signals
To detect genome-wide selective sweeps, we used the XP-EHH [24] and XP-CLR [25] statistics. XP-EHH assesses haplotype differences between two populations and is designed to detect alleles that have increased in frequency to the point of fixation or near fixation in one of the two populations being compared [24,95].
We compared the genome of Ankole cattle, used as a test population, with indicus cattle (Boran, Ogaden, and Kenana grouped into one population), used as a reference population. XP-EHH compared the integrated EHH between the two populations for each SNP and the sign of the XP-EHH score determines the direction of selection with extreme values indicating selection in the test population genome. To facilitate comparison of genomic regions across populations, we then split the genome into non-overlapping segments of 50 kb and computed the maximum XP-EHH score in each segment. In order to define the empirical P-value, genomic windows were binned in increments of 500 SNPs (combining all windows ≥ 1000 SNPs into one) according to the method used previously [95]. Regions with P-values less than 0.01 (1%) were considered strong signals in the Ankole population.
We also performed XP-CLR to identify potential regions differentially selected between the two populations [25]. XP-CLR is a likelihood method for detecting selective sweeps that involve jointly modeling the multilocus allele frequency differentiation between two populations. XP-CLR scores were calculated using XP-CLR software package [25]. We used non-overlapping sliding windows of 50 kb, maximum number of SNPs within each window as 600, and correlation level from which the SNPs contribution to XP-CLR result was down weighted to 0.95. The regions with the XP-CLR values in the top 1% of the empirical distribution (XP-CLR > 97.86) were designated as candidate sweeps and the genes that span the window regions were defined as candidate genes [22]. Significant genomic regions identified from XP-EHH and XP-CLR were annotated to the closest genes (UMD 3.1).
In order to confirm the positive selection of detected genes using these two statistics, we calculated Tajima's D and F ST for the candidate gene regions. Detecting the same gene regions using different methods can provide cogent evidence for selective influences in the region [30]. Tajima's D is used to detect selective sweeps going to fixation in the population that makes rare alleles in excess in the population, which results in a negative Tajima's D [28]. Population differentiation (F ST ) is based on the principle that natural selection can change the amount of differentiation between different populations of a species. When populations are differentiated, the amount of genetic differentiation within the region that includes selected locus will increase during when the genetic differentiation in the genomic region is greater than the level expected under neutrality, which can be a consequence of natural selection [29]. VCFtools was used in a window size of 50 kb at an interval of 5 kb steps to calculate the Tajima's D and F ST values of the candidate gene regions [96].

Characterization of candidate genes under selection
We used the Database for Annotation, Visualization, and Integrated Discovery (DAVID; version 6.7) gene ontology and annotation tool for gene enrichment analysis to further understand the biological functions and pathways of selected genes [26]. Significant GO terms provide insight into the functional characteristics of annotated genes. The KEGG database was also cross-referenced within DAVID to identify significant pathways. R software (version 3.2.1) was used for hierarchical clustering of GO terms from DAVID. Additionally, Cytoscape software's (version 3.2.0) ClueGO plugin was used to visualize the integration of Gene Ontology (GO) terms as well as KEGG pathways and create a functionally organized GO/pathway term network [27] with default settings.

Additional files
Additional file 1: Table S1. A summary of genes from Sanga vs. Zebu