Skip to main content
  • Methodology article
  • Open access
  • Published:

Associations of SNPs located at candidate genes to bovine growth traits, prioritized with an interaction networks construction approach

Abstract

Background

For most domestic animal species, including bovines, it is difficult to identify causative genetic variants involved in economically relevant traits. The candidate gene approach is efficient because it investigates genes that are expected to be associated with the expression of a trait and defines whether the genetic variation present in a population is associated with phenotypic diversity. A potential limitation of this approach is the identification of candidates. This study used a bioinformatics approach to identify candidate genes via a search guided by a functional interaction network.

Results

A functional interaction network tool, BosNet, was constructed for Bos taurus. Predictions for candidate genes were performed using the guilt-by-association principle in BosNet. Association analyses identified five novel markers within BosNet-prioritized genes that had significant effects on different growth traits in Charolais and Brahman cattle.

Conclusions

BosNet is an excellent tool for the identification of single nucleotide polymorphisms that are potentially associated with complex traits.

Background

In bovines, most economically relevant traits (ERTs) are considered to be genetically complex traits; therefore, different approaches have been utilized to identify genetic variation related to phenotypic differences. However, identifying causative genetic variants involved in ERT phenotypes is a difficult task.

Although the genome-wide association approach has become the most frequently applied strategy to identify genetic variation that explains ERTs, the candidate gene approach has also been widely used to identify genetic variation. The candidate gene strategy is efficient because it investigates genes that are expected to be associated with the expression of a trait and defines whether the genetic variation present in populations is associated with phenotypic diversity [1]. In an association study, two of the critical steps used in the candidate gene approach are selecting a suitable candidate gene and identifying the most useful genetic variants or polymorphisms (if known) for testing.

Traditionally, physiological function, positional cloning and comparative genomic approaches have been used to select candidate genes [26]; however, interaction network analysis may also be an excellent alternative to selecting candidate genes for ERTs in bovine. Lim et al. [7] constructed a protein-protein interaction (PPI) network to identify candidate genes for marbling traits in bovines. These authors successfully identified candidate genes associated with intramuscular fat and suggested that the PPI approach can be used to identify biological pathways and regulatory elements involved in marbling-related genes.

The guilt-by-association strategy uses biological information available in databases and statistical methods to identify potential candidate genes in silico. This approach searches for candidate genes based on their interactions with a set of reference genes (genes previously associated with a phenotype) [8]. This approach is based on the tendency of genes associated with the same biological process to interact within a network and organize themselves in modules or functional groups. Within these modules, new candidate genes can be identified, and gene interactions can be analyzed with a set of reference genes (genes previously associated with a phenotype). Based on these interactions, it is likely that these genes will be strongly associated with the set of reference genes and that the single nucleotide polymorphisms (SNPs) in which they are found will be involved in the same biological processes.

Hence, animal science has begun to utilize bioinformatics to model and generate interaction networks that represent the architectural genetics of complex traits in bovines, such as marbling, age at puberty and reproductive characteristics [7, 9, 10]. The objectives of this work were to develop BosNet as a tool for the identification and prioritization of genes associated with complex traits and to assess the efficiency of the BosNet tool in associating SNPs located on BosNet-prioritized genes with bovine growth traits.

Results

Modeled networks for B. taurus

A highly reliable integrated network was constructed for Bos taurus. By identifying orthologous genes, 16,348 new annotations were obtained for bovine genes that were previously lacking annotations, and their combination with known annotations (34,082) resulted in 50,380 annotations for B. taurus genes. The increased number of functional annotations was used to obtain an integrated network referred to as BosNet. This network consists of 1,747,160 associations among 16,065 genes, which is equivalent to 73 % coverage of the bovine genome. BosNet can be freely consulted at http://www.cbg.ipn.mx/investigacion/Paginas/BosNet.aspx. In the current version of BosNet (March 2015), the number of Gene Ontology annotations in the BP (Biological Process) domain has increased by 113 % over the 2012 version of BosNet. The current version consists of 4.19825 million interactions and has 20 % greater B. taurus genome coverage.

By using a text mining approach, 60 genes associated with different parameters related to bovine growth traits were identified. This information permitted an immediate evaluation of the individual contribution of each of the networks for B. taurus to correctly identify genes previously associated with bovine growth. This ability was characterized by receiver operating characteristic (ROC) curves. The area under the curve (AUC) was used as an indicator of the predictive power of each network. The performance of each network modeled from different databases was reduced compared with the performance obtained from the integrated network, indicating that the use of these networks independently reduces both the predictive power and coverage.

Identification and prioritization of candidate genes for growth traits and gene variability in bovine breeds

In the analysis conducted using the BosNet network, the positive predictive value (PPV) was calculated by establishing that all of the genes with an associated score ≥ 39.6468 had a 53 % probability of being associated with the growth trait. The genes that met this condition included RXRA (retinoid X receptor alpha), IGF1R (insulin-like growth factor 1 receptor), TCF15 (transcription factor 15), INS (insulin), USF1 (upstream transcription factor 1) and EGFR (epidermal growth factor receptor).

These genes were used as targets to determine variations in SNPs, which were used in association studies of bovine growth traits. Three new INS gene polymorphisms were identified (g.50,036,892 G > A; C > T g.50,036,987 and g.50,037,033 A > G). Five USF1 gene SNPs were identified with four transitions and one indel (insertion-deletion polymorphism). The g.8,458,558 A > G, g.8,458,837 G > A, g.8,459,971 A > G, g.8,460,354 C > T and g.8,460,878 C > T SNPs are located in intron 2, intron 3, intron 6, exon 8 and intron 9, respectively. The g.8,459,028 -/C indel is located in intron 3. For the TCF15 gene, the analysis only revealed the presence of one SNP (g.60,997,442 G > A), which corresponds to a transition located within intron 1. The RXRA gene demonstrated the highest SNP variation, with a total of 34 SNPs distributed throughout the gene. Of these SNPs, 25 are located in introns, including six transversions. The remaining eight SNPs are located in coding regions, and the most significant is a transversion located in exon 3.

Novel SNPs and GenBank-reported SNPs in the coding regions of the six genes were used for genotyping in two bovine populations. Of the tested SNPs, 70 % and 50 % were monomorphic in the Charolais and Brahman populations, respectively. The allelic frequencies from the polymorphic SNPs are presented in Table 1.

Table 1 Allele frequencies of SNPs located in BosNet-prioritized genes

Association of novel SNPs with growth traits in Charolais and Brahman cattle

We tested the ability of the BosNet tool to prioritize candidate genes by detecting associations between quantitative trait loci and growth traits in Charolais and Brahman cattle.

In the Brahman population, the association analysis demonstrated that only rs136289117 located in the RXRA gene had a significant effect (p = 0.0394) on weaning weight (WW). The heterozygous genotype mean WW (215.029 kg) was approximately 10 kg higher than that of the homozygous CC genotype (206.152 kg).

For Charolais cattle, the association analysis resulted in four novel SNPs that were significantly associated with growth traits (P ≤ 0.04) (Table 2). The TT genotype of the rs210778604 SNP in the IGF1 receptor gene had a significant effect on birth weight (BW), which was 2.5 kg higher than the BW of the heterozygous (CT) and homozygous (CC) genotypes (Table 2). Interestingly, this same locus was significantly related to frame size (FS). The favorable CC genotype produced slightly taller animals (P = 0.0195). The g.106,0040,449 marker located in the RXRA gene was significantly associated with WW. The WW of animals with the CT genotype was approximately 21 kg higher than that of homozygous TT animals (P = 0.0028). The same marker was associated with yearling weight (YW); animals with the CT genotype were 27 kg heavier than animals with the TT genotype (P = 0.0300).

Table 2 Least square means (LSM) ± standard error (SE) of individual effects of evaluated SNPs on growth traits in Charolais cattle
Table 3 Novel and reported SNPs for association analysis

For rs208140993 located in the IGF1R gene, animals harboring the TT genotype had higher WWs than those with complementary genotypes (P = 0.0243). Finally, the rs385131275 marker in the EGFR gene was significantly associated with WW. Animals with the AA genotype exhibited WWs that were 40 and 30 kg higher than those of the heterozygous (GA) and homozygous (GG) genotypes, respectively.

Discussion

The network generated in this research presented significant differences from the interaction networks previously reported for B. taurus. Differences were observed in the sources of information, the methods applied to construct the networks and their coverage, and the number of established interactions. For example, in 2011, Lim et al. [7] employed a literature mining tool to predict genes specifically associated with marbling in cattle and derived two networks primarily associated with the characteristic of interest based on the orthologous relationship between B. taurus and Homo sapiens (interologous method). The first network demonstrates high reliability and consists of 52 genes. Among these genes, 61 interactions were established. The second network is a widespread network composed of 1090 genes and 1517 interactions. After a topological analysis, 20 genes (with a node degree ≥ 25) were selected as candidate genes related to bovine marbling. Five of these genes were associated with bovine marbling when the expression profile of each gene was evaluated.

Similarly, Hulsegge et al. [10] prioritized candidate genes for reproductive characteristics in cattle based on PPIs reported for existing orthologous genes between B. taurus and H. sapiens in the STRING database. The genes were prioritized using the average of two calculated scores. The first score was based on the expression profiles of each gene. The second score was based on a literature search. An enrichment analysis was performed using the Database for Annotation, Visualization and Integrated Discovery (DAVID), and represented biological processes were observed. In this work, 59, 89, 53, 23 and 71 candidate genes were identified with associations with reproductive traits in the amygdala, dorsal hypothalamus, hippocampus, anterior pituitary and ventral hypothalamus, respectively.

Moreover, the coverage values established in BosNet (16,065 genes and 1,747,160 interactions, equivalent to 73 % coverage) were higher than the values estimated by Lim and Hulsegge (4.9 and 27 %, respectively). Thus, BosNet relies on the concept of functional interaction networks and the integration of a wide variety of heterogeneous biological data (orthology relationships with different organisms, interactions reported in various databases, correlations between expression levels, similarities between nucleotide sequences, and shared functional domains), whereas the above-mentioned networks were based on data extracted from only a few sources of information.

In BosNet, each integrated experiment, whether genetic or computational, added evidence for gene associations; thus, a greater number of genes and biological processes could be represented, which improved the coverage and precision of the network [11]. This improvement is evident in the results plotted in the ROC curves, which assess the predictive power of each of the networks derived for B. taurus. The networks derived from a single source of information exhibit a low level of predictive power, low coverage and a reduced number of interactions relative to the networks generated through the integration of diverse biological data. The coverage (27 %) obtained by Hulssege et al. [10] is noteworthy because the coverage was greater than that achieved in previously reported networks and exhibited greater predictive power than STRING (AUC 0.51) in this study, which was similar to the performance obtained in the integrated network BosNet (AUC 0.64). These results were expected because the interactions in STRING were generated using an integrative method that is conceptually similar to the methodology applied in the present study [12]. Another important point is that the predictive power (i.e., ROC curve) of the networks reported for B. taurus that indicates the ability of each of these networks to correctly identify genes involved in a particular characteristic have not been assessed.

The coverage and number of interactions established in BosNet are similar to the results of functional interaction networks reported for other organisms of major economic and scientific importance, such as Oryza sativa, Arabidopsis thaliana, Saccharomyces cerevisiae, Caenorhabditis elegans, Mus musculus and H. sapiens, whose coverages range from 50 to 95 % of the genes reported for each of the organisms, with the number of established interactions ranging from 100,000 to 1.7 million [11, 1317].

Currently, the availability of different types of biological data, such as functional annotations for B. taurus genes, is limited compared with the information available for more thoroughly studied organisms, such as H. sapiens [10].

Recently, systems biology approaches have revealed that genes associated with the same or related phenotypes tend to participate in common functional modules (such as protein complexes and metabolic pathways). Moreover, the analysis of protein interaction networks and the neighborhood of a given protein within the network have been used to functionally characterize proteins (guilt-by-association approach).

The guilt-by-association strategy has been widely applied. For example, Lee et al., in 2008 [13], 2010 [18] and 2011 [16], identified genes directly associated with different phenotypes in C. elegans, A. thaliana and O. sativa, respectively, through an analysis of functional interaction networks.

Due to high genetic variation in the genome, SNPs have become the most useful type of marker for gene mapping and association studies. In bovines, different strategies have been used to discover SNPs and assess SNP associations with ERTs. Lee et al. [19] reported a pipeline to analyze non-synonymous SNPs in B. taurus after screening the SNPs, which were reported as coding SNPs (cSNPs). They detected 15,353 candidate cSNPs and established a panel of 41 SNPs to evaluate associations with puberty age, facial eczema resistance and meat yield. Three SNPs were nominally associated with facial eczema resistance (P < 0.01).

Commercial arrays in genome-wide association studies (GWAS) have been widely used to understand the genetic basis of complex traits in B. taurus; however, the genetic variation underpinning these traits cannot be exclusively explained by this approach. High-throughput sequencing technology could serve as an alternative, but sequencing large numbers of individual genomes remains prohibitively expensive.

Here, we used BosNet to prioritize novel and reported genetic variation in six candidate genes based on SNPs and performed an association study for growth traits.

Because IGF1R is established in the bovine somatotropic axis, the IGF1R gene is one of the only BosNet-prioritized candidate genes that was previously associated with bovine growth traits. The IGF1R gene is the primary receptor for insulin-like growth factors (IGFs), which perform the metabolic signal transduction responsible for cell proliferation, bone growth and protein synthesis in the GH-IGF pathway.

The IGF1R/Taq I polymorphism in one of the introns of this gene, which was identified by Moody et al. [20], has been analyzed in several studies but has not been associated with growth traits. Researchers have concluded that this lack of association is caused by the absence of one of its alleles in B. taurus; its low frequency in B. indicus; and its location on chromosome 21, which is one of the least favorable chromosomes for finding loci associated with growth and carcass composition [2123]. Here, we identified novel polymorphic markers in IGF1R both in Charolais and Brahman cattle. Of these markers, rs210778604 and rs208140993, located in the IGF1R coding regions, were significantly associated with BW/FS and WW, respectively. However, validation of these results with a higher number of animals is required.

The RXRA gene produces a protein that belongs to a family of transcription factors and plays an important role in fat storage and movement. In knockout mice, this transcription factor demonstrated resistance to obesity induced by chemicals that can be found in diets. Adipogenesis and lipolysis were also affected [24]. This gene demonstrated high genetic variation in the studied populations. We confirmed at least 20 SNPs. SNP g106,0040,449 demonstrated a significant association with WW and YW in the Charolais population. BW is correlated with calving ease and survival, and WW is a reliable index of adult weight performance and productive efficiency [25]. Therefore, confirmation of the association is important to include this marker as a tool for marker-assisted selection based on these traits.

Finally, EGFR, which is located on the cell surface, is a mediator of cellular proliferation and differentiation. The binding of its ligand activates a tyrosine kinase that phosphorylates various substrates, thus activating pathways promoting cell growth and DNA synthesis [26]. Here, we found that animals with the AA genotype for the rs385131275 marker from the EGFR gene exhibited WWs that were 40 and 30 kg higher than those of animals with heterozygous (GA) and homozygous (GG) genotypes, respectively.

Insulin is a polypeptide hormone produced and secreted by the beta cells of the islets of Langerhans in the pancreas. Insulin improves the absorption of glucose in cells. Qui et al. [27] proposed insulin gene as a candidate gene for the genetic analysis of complex traits, such as growth rate, body composition and fat deposition, in chickens. They analyzed the associations of four polymorphisms located in non-coding regions with 13 different characteristics of growth and body composition. Their findings indicated that one of the polymorphisms and a combination of haplotypes were significantly associated with BW adjusted to 28 days.

Here, we confirm polymorphisms of novel and previously reported SNPs located in the bovine INS gene. However, no association with the analyzed growth traits was observed.

The participation of the remaining candidate genes (i.e., USF1 and TCF15) in bovine growth could be deduced based on the function established for each of the genes (no association results for this trait were identified in this study, and none have been identified in cattle to date). In mice, the TCF15 gene revealed that this transcription factor is an important regulator of a subset of myogenic cells of the dorsolateral dermomyotome associated with the formation of non-migratory hypaxial muscles (abdominal and intercostal) [28]. Moreover, USF1 is a transcription factor that has been suggested to act as a negative regulator of cell proliferation because it competes for DNA binding sites with transcription factors, such as Myc, which is involved in transformation, cellular proliferation and apoptosis [29, 30].

From a panel of 79 SNPs, we determined that markers rs210778604 and rs208140993 (located in the IGF1R coding regions) were associated with BW/FS and WW, respectively (Table 2). In addition, markers rs385131275 and g.106,004,449 (located on the EGFR and RXRA genes, respectively) were significantly associated with WW and YW in Charolais cattle.

The number of nominally significant associations and the strength of these associations with growth traits were compared to the results obtained from studies that applied the GWAS approach to identify markers associated with growth traits [31]. Thus, BosNet can be used as a prioritization tool to direct the search for novel SNPs that are potentially associated with ERTs.

Updating BosNet is a dynamic process that adds new genes and increases the robustness of each represented biological process. Thus, novel interactions appear that may change the prioritization weighting of each interaction net. Because of this effect, BosNet users must consider that after an update, genes prioritized with a previous version of BosNet may no longer receive prioritization, even if they are still part of the interaction. Here, we use data from the 2012 version of BosNet, as it was at that time that we initially prioritized all the candidate genes that were genotyped and associated with growth traits. According to our records, the prioritization weightings for these genes did not change significantly from those obtained using the BosNet version updated in December 2014; however, in the current version of BosNet (March 2015), none of the previously prioritized genes reached the confidence threshold. We are currently working to improve the network topology analysis. Meanwhile, BosNet users must consider the uniformity of the selected candidate genes and favor those genes that increase the number of strong interactions.

Conclusions

By integrating heterogeneous biological data, a functional interaction network, BosNet, was constructed for B. taurus; BosNet provides 73 % coverage of the estimated genes in the bovine genome.

The transfer of functional Gene Ontology BP annotations to B. taurus genes from orthologous genes in more extensively studied organisms increased the coverage and precision of the integrated network compared with the exclusive use of Gene Ontology annotations reported for B. taurus.

INS, TCF15, IGF1R, RXRA, EGFR and USF1 were identified as candidate genes associated with bovine growth traits through a search guided by BosNet. Re-sequencing of the coding regions of the candidate genes INS, USF1, TCF15 and RXRA identified three, five, one and 34 new SNPs, respectively, as candidates associated with phenotypic variation of bovine growth traits. From these novel SNPs, associations with growth traits were identified in Brahman and Charolais cattle.

Methods

Construction of a functional network for B. taurus

As shown in Fig. 1, different databases were analyzed, and information related to B. taurus was extracted for modeling in an undirected graph G = (V, E), where V and E are a set of vertices and edges in G. Each vertex represents a protein, and each edge (u, v) represents an association between proteins.

Fig. 1
figure 1

BosNet construction. Information compiled from the different databases was modeled as an undirected graph (N1, N2, N3, N4). Each of the nodes and vertices represents an interaction between a protein pair. The score associated with the graph interaction from each database is represented by a different specific source (i.e., expression level, sequence homology, or conserved domains). Because of differences in the measurement scales, standardization was required. New scores were assigned according to the reported functional annotations (Gene Ontology) between interacting proteins. Finally, the different graphs were integrated to create an integrated functional network of interactions between proteins. The final scores were calculated by assigning greater values to interactions that were represented in more than one database

To provide a better confidence weighting between the interactions, a normalization procedure was used. Given a set of interactions E (network) from a k data source where the vertices of each edge E have at least one functional annotation, E was subdivided into subsets using the following approach:

  • The E interactions were analyzed to find the maximum and minimum scores, S k,max and S k,min , respectively.

  • The E interactions were ordered in n subsets b 1 .....bn, with equal intervals between S k,max and S k,min .

  • Each b i subset was used as a different subtype for which confidence was assessed individually using equation (1).

Given an observation O e,k,S and interaction data source with an S value k, the subset or subtype was determined as follows:

$$ BinInde{x}_k(S)=\left\{\begin{array}{c}\hfill \min \left(n, floor\left(\left(\frac{S-{S}_{k, min}}{S_{k, max}-{S}_{k, min}}\right)x\ n\right)+1\right)\hfill \\ {}\hfill \kern3em 0\hfill \end{array}\right\} $$
(1)

Si S ≥ S k,min

Si S < S k,min

  • S ≥ S k,min and S < S k,min represent the requirements that each evaluated score must meet. The score may be greater than, less than or equal to the minimum score value in the net.

  • If S ≥ S k,min , the e confidence based on observation O e,k,S is calculated by the confidence of each subtype defined by BinderIndex k (S).

  • Given that S k,min is determined by the test data based on interactions in which both vertexes are recorded, it is possible that S may be smaller than S k,min . If S < S k,min , the e confidence based on the O e,k,S observation is considered to be 0 because it is not possible to determine its confidence.

  • The floor represents the n subset in the k database to which each evaluated score belongs.

All of the interactions’ confidence values were re-calculated by subset and database using BP domain of Gene Ontology (http://www.geneontology.org/) (The Gene Ontology Consortium, 2000) as a common criterion. Annotations associated with B. taurus genes (~34,082) in the BP domain were downloaded in November 2012.

The interaction confidence was calculated using equation 2:

$$ p\left(k,f\right)\kern0.5em =\kern0.5em \frac{{\displaystyle {\sum}_{\left(u,v\right)\in {E}_{kf}}}{S}_f\left(u,v\right)}{\left|{E}_{kf}\right| + 1} $$
(2)

E kf is the interaction subset from k database, where each interaction has one or both vertexes annotated with f function and both vertexes have at least one functional annotation.

S f (u, v) = 1 if u and v share a function or 0 otherwise.

Multiple graphs constructed from the different databases were combined to obtain a unique graph (G') that includes all nodes and their associations. The confidence of each interaction (u,v) in G' was calculated using equation 3:

$$ {r}_{u,v,f}=1-{\displaystyle {\prod}_{k\in {D}_{u,v}}\left(1-p\left(k,f\right)\right)} $$
(3)

D u,v is the set of databases that have interactions (u,v).

Using the algorithm INPARANOID (http://inparanoid.sbc.su.se/) [32], orthologous gene groups were identified between B. taurus and other organisms, such as H. sapiens, M. musculus, C. elegans, A. thaliana, O. sativa and S. cerevisiae. The functional networks for each of these organisms were downloaded from the FunctionalNet server (http://www.functionalnet.org/): HumanNet v.1 [15], MouseNet v.1 [14], WormNet v.2 [13], AraNet v.1 [18], RiceNet v.1 [16] and YeastNet v.2 [33]. From each of these functional networks, a B. taurus network was derived using an interologous approach [34], and the value previously associated with each of these interactions served as the score of the association.

Data from four microarray experiments conducted in B. taurus were downloaded from Gene Expression Omnibus (GEO) (http://www.ncbi.nlm.nih.gov/geo/info/faq.html) [35]: GSE25005 [36], GSE23837 [37], GSE19055 [38] and GSE35185 [39]. Using GEO2R (http:/www.ncbi.nlm.nih.gov/geo/geo2r), differentially expressed genes were identified with an adjusted p-value ≤ 0.05. We combined the above-mentioned DNA microarray experiments to create a single, consistent expression vector for each differentially expressed gene and then measured the Pearson correlation coefficient between these mRNA expression vectors. Thus, a pair of genes was connected with an edge if the Pearson’s correlation coefficient was ≥ 0.7. This value was also used as a confidence score associated with each interaction.

The BioGRID (http://www.thebiogrid.org) [40], STRING (http://string.embl.de/) [12] and IntAct (http://www.ebi.ac.uk/intact/) [41] databases were downloaded in December 2014. These databases list the interactions between proteins derived from different methods; thus, the proteins are already associated in networks. For this reason, only existing interactions between B. taurus proteins were extracted.

Information assigned to the proteome functional domains of B. taurus was downloaded in December 2014 from the Pfam database (http://pfam.sanger.ac.uk) [42]. An association between two proteins was considered to exist if they shared at least one functional domain. The number of shared domains between each protein was used to represent the score associated with each interaction.

The sequences reported for proteins in the B. taurus genome (23,657) were downloaded from the National Center for Biotechnology Information (NCBI) (http://www.ncbi.nlm.nih.gov/). Using the BLAST application (http://blast.ncbi.nlm.nih.gov/Blast.cgi), a database was created to perform BLAST searches with the downloaded sequences. Using blastp, each of the reported B. taurus protein sequences was compared with the generated database. To model this information as a network, an association between two proteins was established when their alignment length was ≥ 50 % of the length of the query protein. The percentage of similarity was ≥ 40 %, and the e-score was < 0.0001. The negative logarithm of the e-score was used for the associated score of each interaction.

The 15 B. taurus networks derived using the different methods and databases were integrated via the strategy reported by Chua et al. [43], namely, Integrated Weighted Averaging (IWA). The subset size was 10. To recalculate the associated scores, Gene Ontology (http://www.geneontology.org/) [44] annotations associated with B. taurus genes (~32,082) in the BP domain, which was downloaded in November 2012, were used.

Approximately 8243 bovine genes lacked a functional Gene Ontology BP annotation, which directly affected the number of genes that were integrated and the quality of the predictions. To counter this effect, the B. taurus genes without annotations were assigned functional Gene Ontology annotations based on orthology. Thus, orthologous groups of genes present in H. sapiens, M. musculus, C. elegans, and S. cerevisiae were identified, and annotations that were present in each of these organisms were identified and transferred to the genes in question. BosNet was generated by integrating all of the information (Fig. 1).

Identification and prioritization of candidate genes for growth traits

Genie software (http://cbdm.mdc-berlin.de) [45] was used to perform PubMed based-text mining of genes that were previously associated with bovine growth traits (reference genes).

To identify and prioritize candidate genes for each of the integrated networks, the interactions of the reference genes were extracted, and the degree of association with growth (DAG) was calculated for each of the genes in the following subnet.

$$ DAG={\displaystyle {\sum}_{j\ \in\ ref\ genes}{W}_{ij}}\ .\kern0.5em {\displaystyle {\sum}_{j\ \in\ ref\ genes}{P}_{ij}} $$

where Wij is the linkage weight connecting protein i and reference protein j and Pij is the number of links connecting protein i and reference protein j (excluding itself). Thus, the probability that each of these proteins is associated with growth was evaluated based on the protein’s interaction with genes whose biological function had already been associated with this trait.

Using this information, the predictive power of each of the modeled networks for B. taurus was evaluated, and the ability of these networks to correctly identify genes associated with growth was measured. This predictive power was characterized using ROC curves. The AUC was used as an indicator of the predictive power. AUC values ≤ 0.5 represent random predictions; AUC values > 0.5 represent predictions ranging from average to good.

For the selection of candidate genes involved in phenotypic variations in growth traits, the new score was used to calculate PPV, which indicates the likelihood of gene association with the growth trait [46]. The selection criterion for candidate genes to be associated with bovine growth was a PPV greater than 0.5 (genes with a greater than 50 % probability).

Discovery and association of SNPs located in prioritized genes with growth traits

The DNA of two populations was used to conduct the experimental evaluations in this work. All sampling procedures were approved by the Institutional Investigation Ethics Committee (Escuela Superior de Medicina, IPN). The SNP discovery population consisted of nine individuals from varying breeds based on their genetic background and productive purpose (three Holstein, three Brahman and three Charolais). The second group of animals included 237 animals (99 Brahman and 138 Charolais samples). All of the animals were registered, and productive data (weight at birth, weaning and one year of age) were available.

All of the samples were genotyped with 79 SNPs (Table 3) located at the previously prioritized candidate genes using the Sequenom MassARRAY® platform (GeneSeek, Inc., Lincoln, NE, USA). The genotypic and allelic frequencies were estimated using Genepop® 4.0.10 software [47, 48].

Data regarding the growth traits of a 237-animal population of Brahman (n = 99) and Charolais (n = 138) cattle were used to assess the effect of new and previously identified SNPs by BosNet. Brahman data were fitted using a general linear model procedure that included fixed effects (herd, birth season and sex), random effects (sire and birth year), and the individual effects of genotype in each studied SNP. The adjusted growth traits included BW, WW and YW. Charolais data were only fitted with the fixed effects of sex, season and birth year. For Charolais data, growth traits were also described by analyzing the Frame Size (FS). The least mean squares of the genotypes were estimated for SNPs that demonstrated a significant effect, and a mean comparison was performed using the piecewise differentiable (PDIFF) method. All of the procedures were performed using SAS 9.0 software (SAS Institute Inc., Cary, NC, USA).

Abbreviations

SNPs:

Single nucleotide polymorphisms

ERT:

Economically relevant traits

BP:

Biological Process

PPV:

Positive predictive value

ROC:

Receiver operating characteristic curves

AUC:

Area under the curve

RXRA :

Retinoid X receptor alpha

IGF1R :

Insulin-like growth factor 1 receptor

TCF15 :

Transcription factor 15

INS :

insulin

USF1 :

Upstream transcription factor 1

EGFR :

Epidermal growth factor receptor

GWAS:

genome-wide association studies

IWA:

Integrated Weighted Averaging

DAG:

Degree of association with growth

BW:

Weight at birth

WW:

Weaning weight

YW:

One year of age weight

FS:

Frame Size

PDIFF:

Piecewise differentiable.

References

  1. Zhu M, Zhao S. Candidate gene identification approach: progress and challenges. Int J Biol Sci. 2007;3:420–7.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  2. Li C, Basarab J, Snelling W, Benkel B, Murdoch B, Hansen C, et al. Assessment of positional candidate genes myf5 and igf1 for growth on bovine chromosome 5 in commercial lines of Bos taurus. J Anim Sci. 2004;82:1–7.

    CAS  PubMed  Google Scholar 

  3. Lindholm-Perry AK, Kuehn LA, Smith TP, Ferrell CL, Jenkins TG, Freetly HC, et al. A region on BTA14 that includes the Positional candidate genes LYPLA1, XKR4 and TMEM68 is associated with feed intake and growth phenotypes in cattle. Anim Genet. 2012;43:216–9.

    Article  CAS  PubMed  Google Scholar 

  4. Morsci NS, Schnabel RD, Taylor JF. Association analysis of adiponectin and somatostatin polymorphisms on BTA1 with growth and carcass traits in Angus cattle. Anim Genet. 2006;37:554–62.

    Article  CAS  PubMed  Google Scholar 

  5. Schwerin M, Czernek-Schafer D, Goldammer T, Kata SR, Womack JE, Pareek R, et al. Application of disease-associated differentially expressed genes--mining for functional candidate genes for mastitis resistance in cattle. Genet Sel Evol. 2003;35:S19–S34.

  6. Womack JE. Advances in livestock genomics: opening the barn door. Genome Res. 2005;15:1699–705.

    Article  CAS  PubMed  Google Scholar 

  7. Lim D, Kim NK, Park HS, Lee SH, Cho YM, Oh SJ, et al. Identification of candidates genes related to bovine marbling using protein-protein interaction networks. Int J Biol Sci. 2011;7:992–1002.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  8. Marcotte EM, Pellegrini M, Thompson MJ, Yeates TO, Eisenberg DA. A combined algorithm for genome-wide prediction of protein function. Nature. 1999;402:83–6.

    Article  CAS  PubMed  Google Scholar 

  9. Fortes MR, Reverter A, Nagaraj SH, Zhang Y, Jonsson NN, Barris W, et al. A single nucleotide polymorphism-derived regulatory gene network underlying puberty in 2 tropical breeds of beef cattle. J Anim Sci. 2011;89:1669–83.

    Article  CAS  PubMed  Google Scholar 

  10. Hulsegge I, Woelders H, Smits M, Schokker D, Jiang L, Sorensen P. Prioritization of candidate genes for cattle reproductive traits, based on protein-protein interactions, gene expression, and text-mining. Physiol Genomics. 2013;45:400–6.

    Article  CAS  PubMed  Google Scholar 

  11. Lee I, Date SV, Adai AT, Marcotte EM. A probabilistic functional network of yeast genes. Science. 2004;306:1555–8.

    Article  CAS  PubMed  Google Scholar 

  12. Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, et al. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013;41:D808–15.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  13. Lee I, Lehner B, Crombie C, Wong W, Fraser AG, Marcotte EM. A single gene network accurately predicts phenotypic effects of gene perturbation in Caenorhabditis elegans. Nat Genet. 2008;40:181–8.

    Article  CAS  PubMed  Google Scholar 

  14. Kim WK, Krumpelman C, Marcotte EM. Inferring mouse gene functions from genomic-scale data using a combined functional network/classification strategy. Genome Biol. 2008;9 Suppl 1:S5.

    Article  PubMed Central  PubMed  Google Scholar 

  15. Lee I, Blom M, Wang PI, Shim JE, Marcotte EM. Prioritizing candidate disease genes by network-based boosting of genome-wide association data. Genome Res. 2011;21:1109–21.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  16. Lee I, Seo YS, Coltrane D, Hwang S, Oha T, Marcotte EM, et al. Genetic dissection of the biotic stress response using a genome-scale gene network for rice. Proc Natl Acad Sci U S A. 2011;108:18548–53.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  17. Hwang S, Rhee SY, Marcotte EM, Lee I. Systematic prediction of gene function in Arabidopsis thaliana using a probabilistic functional gene network. Nat Protoc. 2011;6:1429–42.

    Article  CAS  PubMed  Google Scholar 

  18. Lee I, Lehner B, Vavouri T, Shin J, Fraser AG, Marcotte EM. Predicting genetic modifier loci using functional gene networks. Genome Res. 2010;20:1143–53.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  19. Lee MA, Keane OM, Glass BC, Manley TR, Cullen NG, Dodds KG, et al. Establishment of a pipeline to analyse non-synonymous SNPs in Bos Taurus. BMC Genomics. 2006;7:298.

    Article  PubMed Central  PubMed  Google Scholar 

  20. Moody DE, Pomp D, Barendse W. Linkage mapping of the bovine insulin-like growth factor-1 receptor gene. Mamm Genome. 1996;7:168–9.

    Article  CAS  PubMed  Google Scholar 

  21. Curi RA, De Oliveira HN, Silveira AC, Lopes CR. Association between IGF-I, IGF-IR and GHRH gene polymorphisms and growth and carcass traits in beef cattle. Livest Prod Sci. 2005;94:159–67.

    Article  Google Scholar 

  22. Akisa I, Oztabaka K, Gonulalpb I, Mengia A, Un C. IGF-1 and IGF-1R gene polymorphisms in East Anatolian Red and South Anatolian Red cattle breeds. Russ J Genet. 2010;46:439–42.

    Article  Google Scholar 

  23. Zhang R, Li X. Association between IGF-IR, m-calpain and UCP-3 gene polymorphisms and growth traits in Nanyang cattle. Mol Biol Rep. 2011;38:2179–84.

    Article  CAS  PubMed  Google Scholar 

  24. Imai T, Jiang M, Chambon P, Metzer D. Impaired adipogenesis and lipolysis in the mouse upon selective ablation of the retinoid X receptor α mediated by a tamoxifen-inducible chimeric Cre recombinase (Cre-ERT2) in adipocytes. Proc Natl Acad Sci U S A. 2001;98:224–8.

    CAS  PubMed Central  PubMed  Google Scholar 

  25. Utsunomiya YT, do Carmo AS, Carvalheiro R, Neves HH, Matos MC, Zavarez LB, et al. Genome-wide association study for birth weight in Nellore cattle points to previously described orthologous genes affecting human and bovine height. BMC Genet. 2013;14:52.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  26. Voldborg BR, Damstrup L, Spang-Thomsen M, Poulsen HS. Epidermal growth factor receptor (EGFR) and EGFR mutations, function and possible role in clinical trials. Ann Oncol. 1997;8:1197–206.

    Article  CAS  PubMed  Google Scholar 

  27. Qiu FF, Nie QH, Luo CL, Zhang DX, Lin SM, Zhang XQ. Association of single nucleotide polymorphisms of the insulin gene with chicken early growth and fat deposition. Poult Sci. 2006;85:980–5.

    Article  CAS  PubMed  Google Scholar 

  28. Wilson-Rawls J, Hurt CR, Parsons SM, Rawls A. Differential regulation of epaxial and hypaxial muscle development by Paraxis. Development. 1999;126:5217–29.

    CAS  PubMed  Google Scholar 

  29. Luo X, Sawadogo M. Antiproliferative properties of the USF family of helix-loop-helix transcription factors. Proc Natl Acad Sci U S A. 1996;93:1308–13.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  30. Yin D, Clarke SD, Etherton TD. Transcriptional regulation of fatty acid synthase gene by somatotropin in 3 T3-F442A adipocytes. J Anim Sci. 2001;79:2336–45.

    CAS  PubMed  Google Scholar 

  31. Buzanskas ME, Gross DA, Ventura RV, Schenkel FS, Sargolzaei M, Meirelles SLC, et al. Genome-wide association for growth traits in canchim beef cattle. PLoS One. 2014;9:e94802.

    Article  PubMed Central  PubMed  Google Scholar 

  32. Östlund G, Schmitt T, Forslund K, Köstler T, Messina TN, Roopra S, et al. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis. Nucleic Acids Res. 2010;38:D196–203.

    Article  PubMed Central  PubMed  Google Scholar 

  33. Lee I, Li Z, Marcotte EM. An improved, bias-reduced probabilistic functional gene network of baker’s yeast, Saccharomyces cerevisiae. PLoS ONE. 2007;2:e988.

    Article  PubMed Central  PubMed  Google Scholar 

  34. Segura-Cabrera A, García-Pérez CA, Rodríguez-Pérez MA, Guo X, Rivera G, Bocanegra-García V. Analysis of protein interaction networks to prioritize drug targets of neglected-diseases pathogens. Med Chem Drug Des. 2012. Prof. Deniz Ekinci (Ed.), ISBN: 978- 953-51-0513-8, InTech, Available from: http://www.intechopen.com/books/medicinalchemistry-and-drugdesign/analysis-of-protein-interaction-networks-to-prioritize-drug-targets-of-neglecteddiseases-pathogens.

  35. Barrett T, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, et al. NCBI GEO: archive for functional genomics data sets--update. Nucleic Acids Res. 2013;41:D991–5.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  36. De Jager N, Hudson NJ, Reverter A, Wang YH. Chronic exposure to anabolic steroids induces the muscle expression of oxytocin and a more than fifty-fold increase in circulating oxytocin in cattle. Physiol Genomics. 2011;43:467–78.

    Article  PubMed  Google Scholar 

  37. Garbe JR, Elsik CG, Antoniou E, Reecy JM, Clark KJ, Venkatraman A, et al. Development and application of bovine and porcine oligonucleotide arrays with protein-based annotation. J Biomed Biotechnol. 2010;2010:453638.

    Article  PubMed Central  PubMed  Google Scholar 

  38. Bionaz M, Periasamy K, Rodriguez-Zas SL, Everts RE. Old and new stories: revelations from functional analysis of the bovine mammary transcriptome during the lactation cycle. PLoS One. 2012;7:e33268.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  39. Machugh DE, Taraktsoglou M, Killick KE, Nalpas NC. Pan-genomic analysis of bovine monocyte-derived macrophage gene expression in response to in vitro infection with Mycobacterium avium subspecies paratuberculosis. Vet Res. 2012;43:25.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  40. Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A, Stark C, et al. The BioGRID interaction database: 2013 update. Nucleic Acids Res. 2013;41:D816–23.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  41. Kerrien S, Aranda B, Breuza L, Bridge A, Broackes-Carter F, Chen C, et al. The IntAct molecular interaction database in 2012. Nucleic Acids Res. 2012;40:D841–6.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  42. Punta M, Coggill PC, Eberhardt RY, Mistry J, Tate J, Boursnell C, et al. The Pfam protein families database. Nucleic Acids Res. 2012;40:D290–301.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  43. Chua HN, Sung WK, Wong L. An efficient strategy for extensive integration of diverse biological data for protein function prediction. Bioinformatics. 2007;23:3364–73.

    Article  CAS  PubMed  Google Scholar 

  44. The Gene Ontology Consortium, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–9.

    Article  PubMed Central  Google Scholar 

  45. Fontaine JF, Priller F, Barbosa-Silva A, Andrade-Navarro MA. Génie: literature-based gene prioritization at multi genomic scale. Nucleic Acids Res. 2011;39:W455–61.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  46. Aragues R, Sander C, Oliva B. Predicting cancer involvement of genes from heterogeneous data. BMC Bioinformatics. 2008;9:172.

    Article  PubMed Central  PubMed  Google Scholar 

  47. Raymond M, Rousset F. GENEPOP (version 1.2): population genetics software for exact tests and ecumenicism. J Hered. 1995;86:248–9.

    Google Scholar 

  48. Rousset F. Genepop'007: a complete reimplementation of the Genepop software for Windows and Linux. Mol Ecol Resour. 2008;8:103–6.

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

The authors thank the different herd owners for supporting the cattle sampling. We acknowledge financial support from research grants FOMIX-TAMAULIPAS 177460 and SIP-IPN 20141262 and 20150648.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana María Sifuentes-Rincón.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

AMSR and ASC conceived of the study and participated in its design and coordination. FAPS and PAM carried out the molecular studies, including sequencing and genotyping. FAPS, ASC and CAGP constructed BosNet. GMPB performed the statistical analysis. FAAPS and AMSR drafted the manuscript. All authors read and approved the final manuscript.

Rights and permissions

Open Access  This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit https://creativecommons.org/licenses/by/4.0/.

The Creative Commons Public Domain Dedication waiver (https://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Paredes-Sánchez, F.A., Sifuentes-Rincón, A.M., Segura Cabrera, A. et al. Associations of SNPs located at candidate genes to bovine growth traits, prioritized with an interaction networks construction approach. BMC Genet 16, 91 (2015). https://doi.org/10.1186/s12863-015-0247-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12863-015-0247-3

Keywords