Skip to main content
  • Research article
  • Open access
  • Published:

Genomic predictions for economically important traits in Brazilian Braford and Hereford beef cattle using true and imputed genotypes

Abstract

Background

Genomic selection (GS) has played an important role in cattle breeding programs. However, genotyping prices are still a challenge for implementation of GS in beef cattle and there is still a lack of information about the use of low-density Single Nucleotide Polymorphisms (SNP) chip panels for genomic predictions in breeds such as Brazilian Braford and Hereford. Therefore, this study investigated the effect of using imputed genotypes in the accuracy of genomic predictions for twenty economically important traits in Brazilian Braford and Hereford beef cattle. Various scenarios composed by different percentages of animals with imputed genotypes and different sizes of the training population were compared. De-regressed EBVs (estimated breeding values) were used as pseudo-phenotypes in a Genomic Best Linear Unbiased Prediction (GBLUP) model using two different mimicked panels derived from the 50 K (8 K and 15 K SNP panels), which were subsequently imputed to the 50 K panel. In addition, genomic prediction accuracies generated from a 777 K SNP (imputed from the 50 K SNP) were presented as another alternate scenario.

Results

The accuracy of genomic breeding values averaged over the twenty traits ranged from 0.38 to 0.40 across the different scenarios. The average losses in expected genomic estimated breeding values (GEBV) accuracy (accuracy obtained from the inverse of the mixed model equations) relative to the true 50 K genotypes ranged from −0.0007 to −0.0012 and from −0.0002 to −0.0005 when using the 50 K imputed from the 8 K or 15 K, respectively. When using the imputed 777 K panel the average losses in expected GEBV accuracy was −0.0021. The average gain in expected EBVs accuracy by including genomic information when compared to simple BLUP was between 0.02 and 0.03 across scenarios and traits.

Conclusions

The percentage of animals with imputed genotypes in the training population did not significantly influence the validation accuracy. However, the size of the training population played a major role in the accuracies of genomic predictions in this population. The losses in the expected accuracies of GEBV due to imputation of genotypes were lower when using the 50 K SNP chip panel imputed from the 15 K compared to the one imputed from the 8 K SNP chip panel.

Background

Brazil is a key player in the global beef market exporting throughout the whole world. Currently, Brazil has a herd of more than 212 million cattle [1], in which Zebu breeds are the most predominant in the national cattle population. However, there are other breeds with high economic impact in the Brazilian and international beef industry as well, such as Hereford and Braford (composite breed, which had genetic contribution from Zebu breeds in its development). Hereford and Braford breeds, together with Angus and Brangus account for 50% of the approximate eight million doses of beef cattle semen commercialized in Brazil in 2013 [2]. Much of this semen, as well as most live bulls sold are mated to Zebu females with the primary objective of improving carcass quality [3].

Genetic progress in Hereford and Braford breeding programs have been achieved through traditional genetic evaluations. However, incorporation of genomic information in livestock breeding programs (e.g. genomic selection, GS [4]) can result in higher and faster genetic progress [46], by decreasing generation interval, increasing accuracy of selection and facilitating incorporation of novel traits of economic importance in the current breeding programs [4, 7, 8]. GS has changed considerably the dairy cattle breeding systems, especially young bulls testing, where some countries have currently partially or completely eliminated the traditional progeny testing [9], with a subsequent reduction in costs in breeding programs [10].

The success of genomic predictions can be evaluated by accuracy of direct genomic breeding values (DGVs), which depends on many factors such as the level of linkage disequilibrium between markers and the quantitative trait loci (QTL), the number of animals in the training population, the heritability of the trait and the distribution of QTL effects over the genome [11]. Thus, the success of GS in dairy cattle, mainly in the Holstein breed, is associated with a large number of genotyped animals, a small effective population size (Ne), large use of key sires world-wide and collaborations among countries for genotype sharing [11]. However, in the beef industry there are more challenges for the implementation of GS due to a larger effective population size compared to dairy cattle breeds, higher number of important beef cattle breeds world-wide, smaller number of key sires used across countries and also minimal collaboration among countries for genotyping sharing [9]. Some reported estimates of Ne for Holstein Friesian are 39 [12], 49 [13], 64 [14] and 90 [14]. For beef cattle some estimates of Ne are 234, 128, 185 and 303 for Angus, Devon, Hereford and Shorthorn, respectively [15], 207 and 285 for Angus and Charolais, respectively [16], and, 445 for American Red Angus [17].

There is a need to increase the size of training population for successful genomic predictions in beef cattle. However, genotyping costs in commercial herds is still a major constraint for implementation of GS. An alternative to reduce costs is to genotype individuals from commercial breeding programs with low-density single nucleotide polymorphisms (SNP) chip panels, which are more affordable for the producers. These low-density panels can then be imputed to a medium or high density SNP chip panel [18, 19] and used to predict genomic breeding values of the animals [20, 21]. It is important to investigate the impact of using imputed genotypes for genomic prediction of breeding values. Although the impact of using imputed genotypes for genomic prediction of breeding values has been investigated in different breeds [22, 23], this has not been reported in beef cattle breeds such as Brazilian Braford (Nellore x Hereford) and Hereford. Thus, the aim of this study was to investigate the accuracy of genomic predictions using true 50 K genotypes, as well as including alternative percentages of animals with imputed genotypes in the training population and different sizes of the training dataset in Brazilian Braford and Hereford beef cattle.

Methods

Genotypic and phenotypic data

Genotypic, phenotypic and pedigree datasets were obtained from the Conexão Delta G’s Genetic Improvement Program (Conexão Delta G, Dom Pedrito, Rio Grande do Sul, Brazil). The dataset contained approximately 520,000 animals from 97 farms located in the South, Southeast, Midwest and Northeast regions of Brazil. Out of these animals there were 683 Hereford and 2,997 Braford animals genotyped (born between 2008 and 2011) plus 130 sires (born between 1982 and 2010).

From the total of genotyped animals, there were 624 Hereford and 2,926 Braford animals genotyped with the Illumina BovineSNP50 panel (Illumina Inc.,San Diego, USA) and 59 Hereford and 71 Braford sires genotyped with the Illumina BovineHD panel (Illumina Inc., San Diego, USA). In addition, there were 88 Nellore bulls (Zebu breed used to develop Braford composite breed) from the Paint Breeding Program (Lagoa da Serra, Sertãozinho, São Paulo, Brazil) genotyped with the Illumina BovineHD panel.

Genotype data editing

Single nucleotide polymorphisms that were not present in both 50 K and 777 K SNP chip panels were removed for imputation to the Illumina BovineSNP50 BeadChip (Illumina Inc., San Diego, CA). Missing genotypes (0.46%) in the 50 K SNP chip panel were previously imputed using FImpute software v.2.2 [24]. Only SNPs located on autosomes with GenCall score (≥0.15), call rate (≥0.90) and p-value for Hardy-Weinberg Equilibrium test (>10−6) were retained for further analyses. Quality control of individuals was based on GenCall score (≥0.15), call rate (≥0.90), heterozygosity deviation (limit of ± 3 SD), repeated sampling and paternity errors. The quality control for imputation to the 777 K SNP panel was the same as the one described before for the imputation to the 50 K SNP panel. The 8 K and 15 K SNP chip panels were used for imputation to 50 K SNP chip panel, and the 50 K SNP panel was used for imputation to the 777 K SNP panel [19] using FImpute v.2.2 [24]. A quality control as described before plus Minor Allele Frequency (MAF ≥ 0.05) was implemented for the genomic prediction of breeding values. Table 1 presents the number of individuals after the data quality control.

Table 1 Number of phenotypes, EBVs and genotypes for each economic trait in the training and validation population after data editing

Traits

Conexão Delta G’s Genetic Improvement Program - Hereford and Braford (Nellore x Hereford) started around 1970. During its early stages animals were selected based on a selection index that included weight gain, scrotal circumference and conformation score traits [25]. In 1975, other traits such as precocity, muscularity and body size scores [26] were incorporated into the selection index. In the 1990, body size score was excluded from the selection index. Furthermore, there are other traits not included in the selection index, which are used for culling of animals as well. Therefore, the traits included in this study can be divided in two groups: 1) traits that make up the selection index used by Conexão Delta G’s Genetic Improvement Program; and 2) traits that are not included in the selection index, but are used for independent culling selection. The current selection index is based on 25% for weight gain from birth to weaning (WGBW), 25% for weight gain from weaning to yearling (WGWY), 4% for conformation score at weaning (CW), 4% for conformation score at yearling (CY), 8% for precocity score at weaning (PW), 8% for precocity score at yearling (PY), 8% for muscularity score at weaning (MW), 8% for muscularity score at yearling (MY), 5% for scrotal circumference adjusted for age at yearling (SCa) and 5% for scrotal circumference adjusted for age and weight at yearling (SCaw). On the other hand, the traits that are not included in the selection index are: birth weight (kg, BW), birth assistance score (scores 1–5, BA), size score at weaning (scores 1–5, SW), size score at yearling (scores 1–5, SY), prepuce (navel) score at weaning (scores 1–5, NW), prepuce (navel) score at yearling (scores 1–5, NY), hair length score at weaning (scores 1–3, HW), hair length score at yearling (scores 1–3, HY), ticks resistance (ticks unit, TR) and ocular pigmentation score (scores 1–3, OP).

The independent culling level was carried out systematically since the beginning of the Conexão Delta G’s Genetic Improvement Program for BW, BA and OP traits, particularly in Hereford, and NW, NY, HW and HY in Braford. The SW and SY traits were part of the selection index between the 1970s and 1990s while the selection of TR has been performed with greater emphasis on young bulls in the last decade.

Traditional genetic evaluation

The package used to obtain the estimated breeding values (EBVs) for each trait was written in Fortran language and developed by GenSys (GenSys Consultores Associados, Porto Alegre, Brazil). Contemporary group was defined based on farm, year-season, sex, and management group. The traits WGBW, CW, CY, PW, PY, MW, MY and SW were pre-adjusted for dam age, birth date, breed, dominance and epistasis effects and environmental interactions (latitude). WGWY was pre-adjusted for calf age, while CW, CY, PW, PY, MW, MY and SY were pre-adjusted for calf and dam age. SCa was adjusted for age at yearling and SCaw was adjusted for age and weight at yearling. TR was pre-adjusted for additive effects of breed. A connectedness analysis was performed prior to each genetic evaluation. The degree of connectedness among contemporary groups was measured based on genetic connections of animals and its common relatives. Genetic connections were weighted by the degree of additive relationship between animals [27, 28]. To be considered connected, contemporary groups were defined as a minimum of 10 genetic direct connections. All individuals not assigned to a contemporary group were excluded from the genetic evaluations.

The general model used for the genetic evaluations was: yijkl = μ + cgi + aj + mk + pek + eijkl, where y ijkl is the phenotype for the animal l, pre-adjusted for the known environmental effects (individual age, dam age and birth date) and genetic fixed effects (breed, dominance, epistatic, complementary and interactions with latitude); μ is the general mean for the trait; cg i is the effect of contemporary group i (fixed effect); a j is the genetic direct effect of animal j (random effect); m k is the maternal genetic effect of cow k (random effect); pe k is the permanent environment effect due to the cow k (random effect); and e ijkl is the residual effect associated to the observation ijkl. The required variance components were estimated using Restricted Maximum Likelihood (REML). A robust estimation procedure regarding to the heterogeneity of residual variance within contemporary groups [29] was used. The robust estimation procedure allows observations from cg with large residual variance to have reduced influence, while not giving to much weight to observation from cg with low residual variance [29].

Two EBV sets were generated: the first one was estimated using all available information to date while the second set was estimated using information from all animals born before 2010. These two sets of EBVs were then used as pseudo-phenotypes in the genomic prediction models for validation and training populations, respectively.

De-regressed EBVs

The second set of EBVs (for the training population) was de-regressed and used as pseudo-phenotypes to estimate genomic markers effects. The approach described by VanRaden and Wiggans [30] was used to calculate de-regressed EBVs using EBVs and reliabilities of genotyped animals and their sires and dams. De-regressed EBVs were calculated for animals of the training population with EBV reliability greater than the overall mean (r 2 = 0.09) and that satisfied the following condition: \( abs\left(\frac{\left(EBV- dEBV\right)}{sdEBV}\right)\le 10 sdEBV \), where abs represents the absolute value, EBV is the estimated breeding value, dEBV is the de-regressed EBV and sdEBV is the standard deviation of the EBVs.

Prediction of DGV and GEBV

Direct genomic values (DGVs) were estimated using GBLUP method as described in VanRaden [7] for all the twenty traits (Table 1), using either 50 K or 777 K SNP chip panels and de-regressed EBVs. The GEBV software was used for the analysis [31]. The following linear model was implemented: y  = 1 n μ+Zg+e, where y is the vector of de-regressed EBV for the trait, μ is the overall mean, 1 n is a vector of ones, Z is the design matrix that relates de-regressed EBVs to animals, g is the vector of DGV to be predicted, and e is the vector of residual effects. It was assumed that g ~ N (0, G*σ2 g) where σ2 g is the additive genetics variance and G* is a combined relationship matrix (80% genomic relationship and 20% pedigree-based relationship), and e ~ N (0, Rσ2 e) where σ2 e is the residual variance and R is a diagonal matrix whose elements account for the differences in reliabilities of the de-regressed EBVs. The reason for using a combined relationship matrix is due to the fact that previous studies, also performed in Brazil, reported gains in accuracies when adding 20% pedigree-based relationship (e.g. [32]).

The genomic estimated breeding values (GEBV) were estimated using the blending procedure outlined by Hayes et al. [11] and described as: \( \mathbf{GEBV}=\frac{{\mathbf{r}}_{\mathbf{DGV}}^{\mathbf{2}}*\mathbf{D}\mathbf{G}\mathbf{V}+{\mathbf{r}}_{\mathbf{EBV}}^{\mathbf{2}}*\mathbf{E}\mathbf{B}\mathbf{V}}{{\mathbf{r}}_{\mathbf{DGV}}^{\mathbf{2}}+{\mathbf{r}}_{\mathbf{EBV}}^{\mathbf{2}}} \), where r 2 DGV and r 2 EBV are the reliability of DGV and EBV, respectively.

Training and validation populations

For the genomic predictions as described in the previous section, the dataset was split into two groups: training and validation populations. To simulate what would happen in practice (genotype and phenotypes from older animals used to predict breeding values of younger animals), the training population included all animals born before 2011, while the validation group included all the animals born in 2011 (youngest animals). Training and validation groups varied in size for each trait (Table 1). The training group had 100% of true genotypes or alternatively, between 10% and 60% (10%, 20%, …, 60%) of imputed genotypes in the first two groups of scenarios. In the third group of scenarios, the training group, had 91% of imputed genotypes because only 212 animals were genotyped with 777 K chip panel. In the validation groups, only true genotypes were included.

Genomic prediction scenarios

Three groups of genomic prediction scenarios were designed to mimic situations where different proportion of animals with imputed genotypes (derived from alternate low-/medium- density SNP chip panels) were included in the training set. The first two groups of scenarios were created based on animals genotyped (mimicked from 50 K SNP chip panel) with 8 K and 15 K SNP chip panels and imputed to the 50 K SNP chip panel [19]. For this study, the two best scenarios based on concordance rate and allelic R2 were used: 8 K scenario (concordance rate: 0.952 and allelic R2: 0,927) and 15 K scenario (concordance rate: 0.973 and allelic R2: 0,962). The 20 K panel was slightly superior to the 8 K panel. However, various markers from the 20 K are not included in the 50 K and 777 K, and therefore, when matching panels for imputation 8 K and 20 K become very similar [19].

The first group of scenarios (SCE1) was created with different percentages of animals with imputed genotypes and unequal training population sizes (separated based on birth year, as described in the previous section). The second group of scenarios (SCE2) was also created with different percentages of animals with imputed genotypes, however, with same size training populations. The third group of scenarios (SCE3) was based on animals genotyped using the 50 K SNP chip panel and imputed to the 777 K SNP chip panel. More details for all scenarios are shown in the Table 2.

Table 2 Number of animals with true and imputed genotypes in each scenario in the training set for the weight gain from birth to weaning (WGBW) traita

Comparisons between scenarios

The prediction accuracies of GEBV were used to compare the scenarios evaluated. The prediction accuracies were calculated as Pearson’s correlation between DGVs and EBVs (validation accuracy) from the validation population. Accuracy obtained from the mixed model equations (expected accuracy) in the validation population was used to quantify losses in GEBV accuracy due to the use of imputed genotypes compared to the true 50 K SNP chip panel. Expected accuracy was also used to quantify the gain in breeding value accuracies when using molecular marker information in the EBV estimation. The factors affecting validation accuracies and losses in expected GEBV accuracies were tested by carrying out an analysis of variance in the ANOVA procedure of SAS version 9.2 (SAS Inst. Inc., Cary, NC).

Results

Phenotypic and genotypic data

As shown in Table 1, the average heritability estimates (± SD) for the traits included in this study was 0.30 ± 0.09 and it ranged from 0.10 (BA) to 0.46 (NW). The average number of individuals (± SD) in the training population was 1,603.4 ± 594.7 and ranged from 654 (TR) to 2,492 (BW). The average number of individuals in the validation population (± SD) was 913.7 ± 122.1 and ranged from 414 (BA) to 980 (WGBW, CW, PW and MW).

Adding alternative imputed genotypes to increase the size of training population (SCE1)

In the SCE1, we investigated the prediction accuracies of genomic breeding values when increasing the size of the training population by inclusion of imputed genotypes (imputed from 8 K or 15 K to 50 K). Tables 1 and 2 show the number of animals in the training and validation population within scenario evaluated and total number of genotypes (true and imputed). The average validation accuracies for the traits included in the selection index ranged from 0.29 to 0.31 (Table 3), while for the traits not included in the selection index, it ranged from 0.47 to 0.49 (Table 4). However, for the traits related to fitness (NW, NY, HW, HY, TR and OP), the average validation accuracy ranged from 0.63 to 0.65 (Table 4). As shown in Table 5, higher accuracies were observed for the majority of the traits when including a higher proportion of imputed genotypes in the training population.

Table 3 DGVs validation accuracies for the SCE1 scenarios and traits included in the selection indexa, b
Table 4 DGVs validation accuracies for the SCE1 scenarios and traits included not included in the selection indexa, b
Table 5 Results of analysis of variance of the DGV validation accuracies for the SCE1 scenariosa, b, c

The differences between the 8 K and 15 K SNP panels were not significant (P > 0.05) for the majority of the traits (55%) and in general, when there was a significant difference, 15 K performed better than 8 K (Table 5). When comparing 8 K and 15 K to the true 50 K SNP, significant differences (P < 0.05) were observed in 45% and 40% of the traits, respectively. In general, the accuracies were higher when using the true 50 K SNP panel. However, for BW higher accuracies were observed when using imputed genotypes from 8 K and 15 K.

When evaluating the size and percentage of animals with imputed genotypes in the training population in SCE1 scenarios (Table 5), 86% of the comparisons were statistically significant (P < 0.05). In general, larger training populations (i.e. including larger proportion of imputed genotypes) provided greater prediction accuracies. Regarding to the panel used, 60% of the comparisons between 8 K and 15 K SNP chip panels and the true 50 K panel were statistically significant (P < 0.05) and in most cases the true 50 K SNP panel provided greater accuracies (Table 5). For the 8 K and 15 K SNP panel the average losses in expected GEBV accuracy were between −0.004 and −0.0011. More details about the losses in GEBV accuracies are presented in Additional file 1.

Comparing different proportion of imputed genotypes keeping constant the size of the training population (SCE2)

Table 6 shows the accuracies of genomic predictions in the SCE2 scenarios, where the training population size was held constant but the percentage of imputed animals varied from 0% to 60%. The average accuracies ranged between 0.29 and 0.30 for the traits included in the selection index and for the traits not included in the selection index they were all the same (0.49). There were no significant differences among the alternate percentage of imputed animals (P > 0.05) for all traits (Table 7). The comparison of the 8 K and 15 K SNP chip panels to the true 50 K SNP panel showed significant differences (P < 0.05) in 45% and 60% of the cases, respectively (Table 7). When there were significant differences, in general, the true 50 K SNP panel performed better than the imputed genotypes. The differences between the 8 K and 15 K SNP panels were not significant (P > 0.05) for the majority of traits (Table 7).

Table 6 DGV validation accuracy in the validation population for the SCE2 and SCE3 scenariosa, b
Table 7 Results of analysis of variance of the DGV validation accuracies for the SCE2 scenariosa,b,c

Losses in expected GEBV accuracy were measured within each level of the scenario in relation to the same level of the scenario using only the actual genotypes. All losses were statistically different from actual (not imputed) 50 K SNP panel (P < 0.05) and were higher when using the 8 K SNP panel in relation to the 15 K SNP panel. For the 8 K and 15 K SNP panel the average losses in expected GEBV accuracy were between −0.0002 and −0.0011 across scenarios (Additional file 2). In general, a higher proportion of imputed genotypes in the training population in SCE2 was associated with larger reductions in accuracies.

There were no statistically significant differences (P > 0.05) in validation accuracies when including different percentage of imputed animals (Table 7). When comparing the SNP chip panels, there were no significant differences for 60% of the traits (P > 0.05) between the 8 K and 15 K SNP chip panels and the true 50 K SNP chip panel (Table 7). The losses in GEBV accuracies are presented in Additional file 2. In brief, losses in expected GEBV accuracy were statistically different from true 50 K genotypes (P < 0.05) for traits not included in the selection index. Using imputed genotypes from 8 K and 15 K to the 50 K SNP chip panel, the average losses in GEBV expected accuracy ranged between −0.0004 and −0.0013 and −0.0003 to −0.0013, respectively.

Comparing prediction accuracy of genomic breeding values using 50 K or 777 K SNP chips (SCE3)

We also investigated the use of a 777 K SNP chip panel imputed from 50 K (SCE3). The average DGV validation accuracies were 0.31 and 0.50 for traits included or not in the selection index, respectively (Table 6). The average loss in expected GEBV accuracy was −0.0021 (Additional file 2).

Including genomic information

The average expected EBV accuracy in the training population was 0.64 and ranged from 0.52 to 0.74. For the validation population, the average expected EBV accuracy was 0.63 and ranged between 0.51 and 0.73. Average expected GEBV accuracy was 0.66 for the scenario with all animals and 60% imputed genotypes (SCE1-60% and SCE2-60%) and 0.65 for the SCE3 scenario. The increase in average expected GEBV accuracy in the validation population by adding the information of the markers was 0.03. The average expected DGV accuracy across traits was 0.40 (Table 8). The increase in expected GEBV accuracy by adding marker information was about 0.02 in all scenarios.

Table 8 Expected EBV accuracy in the training and validation population and expected GEBV and DGV accuracy in the validation population in the scenario with the largest training populationa

Discussion

Wide application of GS in beef cattle breeding programs depends among other factors, on the price of genotyping. The current medium or high density SNP chips are still expensive for widespread use in the beef industry, considering the number of individuals needed for reasonably accurate genomic predictions of breeding values. Genotype imputation has been used as an alternative to reduce costs [21, 23, 33, 34]. In this study, we investigated different scenarios as alternatives to use imputed genotypes in commercial beef cattle breeding programs. The correlation between DGV and EBV (validation accuracy) has been used to represent the accuracy of DGV [6, 35, 36]. The validation accuracy for traits in the selection index were lower than values reported in the literature for other breeds such as Angus, Limousin and Simmental [35, 36]. Neves et al. [32], working with Brazilian Nellore and the same set of traits (included in the selection index) also reported greater validation accuracies, except for WGBW and CW. The lower values of validation accuracy in this study could be explained by the lower expected EBV accuracies in the training population (r = 0.64). The greater validation accuracies observed for traits with higher heritability estimates (e.g., post weaning traits) has also been reported in the literature (e.g. Brito et al. [6] working with simulated data of beef cattle, Akanno et al. [37] studying pigs, and Khatkar et al. [33] working with Australian dairy cattle). In general, high heritability traits are associated with larger accuracy estimates. The reason why scrotal circumference (i.e. high heritability trait) presented the lowest validation accuracy could be due to the smaller number of animals in the training population (n = 708) compared to the other traits, as the size of training population is another very important component for the accuracy of genomic predictions.

The results from SCE1 and SCE2 showed that the size of the training population was more important than the percentage of animals with imputed genotypes. Other studies in dairy cattle have also reported small reduction in accuracies when using imputed genotypes to predict the effect of the markers [20, 21, 23, 33, 38]. These findings indicate that in order to improve the accuracies of genomic predictions in Brazilian Braford and Hereford, it is important to increase the size of the training population. It could be done by genotyping more animals with 50 K SNP chip panel or with 8 K or 15 K and impute to 50 K. On the basis of the relatively small reduction in accuracy of genomic prediction when using imputed genotypes, we would then recommend the use of 15 K for large-scale genotyping as long as its costs are acceptable to Brazilian Braford and Hereford breeders.

As discussed in Piccoli et al. [19] including pedigree information did not increase concordance rate or allelic R2. This could be expected due to the weak structure of the pedigree within the set of genotyped animals and in the whole pedigree file. Similar results were found by Carvalheiro et al. [22] when working with Nellore in Brazil with similar pedigree structure. It is also important to highlight that the dataset used for this investigation is from a commercial breeding program and not from research herds. Therefore, in practice there will be always an interest on predicting young animals based on the information from previous generations (with phenotypes and genotypes).

In theory, increasing the number of SNPs in a panel will increase the level of linkage disequilibrium (LD) between a SNP and a QTL and consequently there should be an improvement in accuracies of genomic predictions of breeding values. This assumption has been confirmed by other studies in the literature that reported increased prediction accuracies when using imputed HD genotypes compared to medium density genotypes (e.g. 50 K). For instance, Boison et al. [39], working with Guzerá (Bos indicus) cattle, reported an increase of 8% (averaged across all traits) when using imputed HD genotypes compared to true 50 K genotypes. Brito et al. [6], in a study with simulated data of beef cattle reported an increase of 0.09 in the DGV accuracy by using a 777 K SNP panel compared to a 50 K SNP panel. Weigel et al. [22] and Vazquez et al. [40] also reported higher accuracies of prediction using denser SNP markers. Other studies have reported gains in accuracy, even smaller, when using imputed 777 K compared to 50 K [34, 41, 42]. In the present study, we also investigated genotype imputation from 50 K to 777 K in Brazilian Braford and Hereford. However, no major advantages of the 777 K over the 50 K were observed. The losses in accuracies obtained for GEBV and DGV when using 777 K (SCE3) are probably associated with a higher percentage of imputation errors, as in this scenario only 212 animals had true HD genotypes [19]. Therefore, our results do not support the use of an imputed 777 K SNP chip panel to increase genomic prediction accuracies in Brazilian Braford and Hereford breeds. Similarly to our study, Su et al. [42] reported no gain in prediction accuracy when using imputed 777 K genotypes vs. the 50 K in Nordic Holstein and Red Dairy cattle.

Despite the losses in expected GEBV accuracies when using 8 K and 15 K SNP panels were statistically significant when compared to the ones attained using the true 50 K SNP chip panel, they were smaller with the 15 K compared to 8 K SNP chip panel. These results can be explained by the highest concordance rate in the 50 K SNP panel imputed from the 15 K SNP panel [19], lower genotyping errors and denser genome coverage. Similar trend was reported by Segelke et al. [38] when they analyzed the losses in reliability from imputed panels of two different densities of SNPs in German dairy cattle. Sargolzaei et al. [43], working with Canadian dairy cattle and 3 K SNP chip panel, also reported losses in reliability around −0.02. Boison et al. [39] also reported a small loss in accuracy of prediction using imputed 50 K from 3 K, while prediction accuracy remained the same for the imputed 50 K from 7 K. Chen et al. [44], working with Canadian dairy cattle, reported that the 6 K SNP (imputed to 50 K) panel performed better than the 3 K (imputed to 50 K) panel and resulted in the least reduction of genomic prediction accuracy among all the low-density panels evaluated in their study. The authors also reported that including genotypes imputed from the 6 K panel achieved almost the same accuracy of genomic prediction as that of using the 50 K panel even when 66% of the training population was genotyped on the 6 K panel. Our results and reports from the literature suggests that genomic prediction of breeding values derived from genotypes imputed from higher density SNP chip panels provide greater accuracies and in some cases, even the same than using the true 50 K genotypes. Similar trend was pointed out in a review study by Calus et al. [45], where the authors reported that in dairy cattle within-breed genomic predictions, the use of imputed 50 K genotypes typically yields 85% to almost 100% of the reliability obtained with a 50 K panel, provided that the low-density panel contains at least 3 K genotypes. Other studies in dairy cattle have shown that further imputation to 777 k SNPs yielded at most a limited further increase in reliability of genomic breeding values for within breed genomic prediction.

The traits related to fitness (NW, NY, HW, HY, TR and OP) had higher values of validation accuracy in relation to the other traits, including those from the selection index. These higher values are probably associated with greater genetic variability due to a weaker selection. The results found by Akanno et al. [37], working with simulated data in swine strengthens this theory because they found much higher accuracy for the indigenous population (low selection pressure) in comparison with the exotic population (high selection pressure). However, Neves et al. [32] studying Brazilian Nellore reported validation accuracy lower than those attained for the NW and NY traits in the current study. This could be due to a stronger selection pressure in Nellore breed compared to Braford and Hereford. Another explanation could be due to the fact that fitness traits in general have lower heritability estimates potentially due to the influence of non-additive genetic factors [46, 47]. The validation accuracy for the BW and the BA in this study were lower than other traits studied. This could be related to the strong selection, which is carried out in Hereford breed for these traits. Saatchi et al. [35, 48] working with Angus, Limousin, Simmental and Hereford breeds, found higher validation accuracy for these two traits compared to those reported in this study.

Validation accuracies for BW in the SCE2 scenario were not influenced by either the panel or the percentage of imputed animals in the training population. Different results were observed in SCE1, where both the panel and the number of animals in the training population influenced the validation accuracy of BW. Hayes et al. [11] showed that the values of genomic prediction accuracies are influenced by the size of the training population and Brito et al. [6] working with simulated data from beef cattle, also showed that the size of the training population has a major effect on the accuracies. Similar results were observed for the fitness traits (NW, NY, HW, HY, TR and OP). In general, the expected DGV accuracy for all traits across levels of each scenario were lower than the accuracy of the parents’ average as reported by Brito et al. [6] which were between 0.44 and 0.58 for traits with heritability estimates from 0.10 to 0.40.

The losses in expected GEBV accuracy in each scenario were always analyzed relatively to the scenario where only true genotypes were used. For traits not included in the selection index over the different scenarios, the losses in expected GEBV accuracy were, on average, higher compared to the group of traits included in the selection index used in the Brazilian Braford and Hereford breeding program. However, the losses in expected GEBV accuracy for the majority of traits were greater when using 50 K genotypes imputed from the 8 K SNP panel compared to 15 K. A greater percentage of animals with imputed genotypes was associated to higher losses in expected GEBV accuracy, regardless of the scenario investigated. In other words, losses in expected GEBV accuracy were higher due to greater error rates in the genotyping imputation process [19].

Conclusions

The percentage of animals with imputed genotypes in the training population did not significantly influence the validation accuracy (Pearson’s correlation), but the size of the training population did influence the validation accuracies. The losses in expected GEBV accuracy due to imputation of genotypes were lower when using the 50 K SNP panel imputed from the 15 K SNP panel instead of imputation from the 8 K SNP panel. Therefore, using the low-density panels may allow Brazilian Braford and Hereford cattle breeders to genotype more animals, preferentially using 15 K or 50 K SNP chip panels, and consequently enlarging the size of the training population, which might in fact increase the accuracy of the DGV.

Abbreviations

BA:

Birth assistance score

BW:

Birth weight

CW:

Conformation score at weaning

CY:

Conformation score at yearling

DGV:

Direct genomic breeding value

EBV:

Estimated breeding value

GBLUP:

Genomic Best Linear Unbiased Prediction

GEBV:

Genomic estimated breeding value

GS:

Genomic selection

HW:

Hair length score at weaning

HY:

Hair length score at yearling

MW:

Muscularity score at weaning

MY:

Muscularity score at yearling

NW:

Prepuce (navel) score at weaning

NY:

Prepuce (navel) score at yearling

OP:

Ocular pigmentation score

PW:

Precocity score at weaning

PY:

Precocity score at yearling

QTL:

Quantitative trait loci

SCa:

Scrotal circumference adjusted for age at yearling

SCaw:

Scrotal circumference adjusted for age and weight at yearling

SNP:

Single Nucleotide Polymorphisms

SW:

Size score at weaning

SY:

Size score at yearling

TR:

Tick resistance

WGBW:

Weight gain from birth to weaning

WGWY:

Weight gain from weaning to yearling

References

  1. FAO - Food and Agriculture Organization of the United Nations. 2014. http://www.fao.org/faostat/en/#data/QA . Accessed 11 Nov 2016.

  2. Index ASBIA - importação, exportação e comercialização de sêmen (In portuguese). 2011.http://www.asbia.org.br/novo/upload/mercado/relatorio2011.pdf. Accessed 11 Nov 2016.

  3. Fries L. Cruzamentos em gado de corte. In: 4° Simpósio sobre pecuária de corte 1996, 4(1996):109–128.

  4. Meuwissen TH, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157(4):1819–29.

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Aguilar I, Misztal I, Johnson D, Legarra A, Tsuruta S, Lawlor T. Hot topic: A unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J Dairy Sci. 2010;93(2):743–52.

    Article  CAS  PubMed  Google Scholar 

  6. Brito FV, Neto JB, Sargolzaei M, Cobuci JA, Schenkel FS. Accuracy of genomic selection in simulated populations mimicking the extent of linkage disequilibrium in beef cattle. BMC Genetics. 2011;12(1):1.

    Article  Google Scholar 

  7. VanRaden P. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.

    Article  CAS  PubMed  Google Scholar 

  8. Goddard M, Meuwissen THE, Hayes BJ. Genomic selection in farm animal species–lessons learnt and future perspectives. In: Proceedings of the 9th World Congress on Genetics Applied to Livestock Production’. Leipzig, Germany; 2010

  9. Miller S. Genetic improvement of beef cattle through opportunities in genomics. Rev Bras Zootec. 2010;39:247–55.

    Article  Google Scholar 

  10. Schaeffer L. Strategy for applying genome-wide selection in dairy cattle. J Anim Breeding Genet. 2006;123(4):218–23.

    Article  CAS  Google Scholar 

  11. Hayes B, Bowman P, Chamberlain A, Goddard M. Invited review: Genomic selection in dairy cattle: Progress and challenges. J Dairy Sci. 2009;92(2):433–43.

    Article  CAS  PubMed  Google Scholar 

  12. Weigel K. Controlling inbreeding in modern breeding programs. J Dairy Sci. 2001;84:E177–84.

    Article  CAS  Google Scholar 

  13. Sørensen AC, Sørensen MK, Berg P. Inbreeding in Danish dairy cattle breeds. J Dairy Sci. 2005;88(5):1865–72.

    Article  PubMed  Google Scholar 

  14. De Roos A, Hayes BJ, Spelman R, Goddard ME. Linkage disequilibrium and persistence of phase in Holstein–Friesian, Jersey and Angus cattle. Genetics. 2008;179(3):1503–12.

    Article  PubMed  PubMed Central  Google Scholar 

  15. Piccoli M, Braccini Neto J, Brito F, Campos L, Bértoli C, Campos G, Cobuci J, McManus C, Barcellos J, Gama L. Origins and genetic diversity of British cattle breeds in Brazil assessed by pedigree analyses. J Anim Sci. 2014;92(5):1920–30.

    Article  CAS  PubMed  Google Scholar 

  16. Lu D, Sargolzaei M, Kelly M, Li C, Vander Voort G, Wang Z, Plastow G, Moore S, Miller S. Linkage disequilibrium in Angus, Charolais, and Crossbred beef cattle. Front Genet. 2012;3:152.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Marquez G, Speidel S, Enns R, Garrick D. Genetic diversity and population structure of American Red Angus cattle. J Anim Sci. 2010;88(1):59–68.

    Article  CAS  PubMed  Google Scholar 

  18. Ventura R, Lu D, Schenkel F, Wang Z, Li C, Miller S. Impact of reference population on accuracy of imputation from 6K to 50K single nucleotide polymorphism chips in purebred and crossbreed beef cattle. J Anim Sci. 2014;92(4):1433–44.

    Article  CAS  PubMed  Google Scholar 

  19. Piccoli ML, Braccini J, Cardoso FF, Sargolzaei M, Larmer SG, Schenkel FS. Accuracy of genome-wide imputation in Braford and Hereford beef cattle. BMC Genet. 2014;15(1):1.

    Article  Google Scholar 

  20. Berry DP, Kearney J. Imputation of genotypes from low-to high-density genotyping platforms and implications for genomic selection. Animal. 2011;5(08):1162–9.

    Article  CAS  PubMed  Google Scholar 

  21. Mulder H, Calus M, Druet T, Schrooten C. Imputation of genotypes with low-density chips and its effect on reliability of direct genomic values in Dutch Holstein cattle. J Dairy Sci. 2012;95(2):876–89.

    Article  CAS  PubMed  Google Scholar 

  22. Weigel K, de Los CG, Vazquez A, Rosa G, Gianola D, Van Tassell C. Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle. J Dairy Sci. 2010;93(11):5423–35.

    Article  CAS  PubMed  Google Scholar 

  23. Dassonneville R, Brøndum RF, Druet T, Fritz S, Guillaume F, Guldbrandtsen B, Lund MS, Ducrocq V, Su G. Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations. J Dairy Sci. 2011;94(7):3679–86.

    Article  CAS  PubMed  Google Scholar 

  24. Sargolzaei M, Chesnais JP, Schenkel FS. A new approach for efficient genotype imputation using information from relatives. BMC Genomics. 2014;15(1):1.

    Article  Google Scholar 

  25. Recommendation BIF. Guidelines for uniform beef improvement programs. US Government Printing Office; 1972

  26. Long RA. El sistema de evaluación de Ankony y su aplicación en la mejora del ganado. Uruguay: Revista de la Asociación Rural del Uruguay, Montevideo; 1974.

  27. Roso V, Schenkel F, Miller S. Degree of connectedness among groups of centrally tested beef bulls. Can J Anim Sci. 2004;84(1):37–47.

    Article  Google Scholar 

  28. Roso V, Schenkel F, Miller S, Schaeffer L. Estimation of genetic effects in the presence of multicollinearity in multibreed beef cattle evaluation. J Anim Sci. 2005;83(8):1788–800.

    Article  CAS  PubMed  Google Scholar 

  29. Carvalheiro R, Fries LA, Schenkel FS, Albuquerque LG. Effects of heterogeneity of residual variance among contemporary groups on genetic evaluation of beef cattle. Rev Bras Zootec. 2002;31(4):1680–8.

    Article  Google Scholar 

  30. VanRaden P, Wiggans G. Derivation, calculation, and use of national animal model information. J Dairy Sci. 1991;74(8):2737–46.

    Article  CAS  PubMed  Google Scholar 

  31. Sargolzaei M, Schenkel FS, VanRaden PM. gebv: Genomic breeding value estimator for livestock. In: Technical report to the Dairy Cattle Breeding and Genetics Committee. University of Guelph; 2009

  32. Neves HH, Carvalheiro R, O’Brien AMP, do Carmo AS, Utsunomiya YT, Schenkel FS, Sölkner J, McEwan JC, Van Tassell CP, Cole JB. Accuracy of genomic predictions in Bos indicus (Nellore) cattle. Genet Sel Evol. 2014;46(1):1.

    Article  Google Scholar 

  33. Khatkar MS, Moser G, Hayes BJ, Raadsma HW. Strategies and utility of imputed SNP genotypes for genomic analysis in dairy cattle. BMC Genomics. 2012;13(1):1.

    Article  Google Scholar 

  34. Erbe M, Hayes B, Matukumalli L, Goswami S, Bowman P, Reich C, Mason B, Goddard M. Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels. J Dairy Sci. 2012;95(7):4114–29.

    Article  CAS  PubMed  Google Scholar 

  35. Saatchi M, McClure MC, McKay SD, Rolf MM, Kim J, Decker JE, Taxis TM, Chapple RH, Ramey HR, Northcutt SL. Accuracies of genomic breeding values in American Angus beef cattle using K-means clustering for cross-validation. Genet Sel Evol. 2011;43(1):1.

    Article  Google Scholar 

  36. Boddhireddy P, Kelly M, Northcutt S, Prayaga K, Rumph J, DeNise S. Genomic predictions in Angus cattle: Comparisons of sample size, response variables, and clustering methods for cross-validation. J Anim Sci. 2014;92(2):485–97.

    Article  CAS  PubMed  Google Scholar 

  37. Akanno E, Schenkel F, Sargolzaei M, Friendship R, Robinson J. Persistency of accuracy of genomic breeding values for different simulated pig breeding programs in developing countries. J Anim Breeding Genet. 2014;131(5):367–78.

    Article  CAS  Google Scholar 

  38. Segelke D, Chen J, Liu Z, Reinhardt F, Thaller G, Reents R. Reliability of genomic prediction for German Holsteins using imputed genotypes from low-density chips. J Dairy Sci. 2012;95(9):5403–11.

    Article  CAS  PubMed  Google Scholar 

  39. Boison SA, Santos DA, Garcia J, Sölkner J, Peixoto M, da Silva M. Genomic Evaluation Using 50K and Imputed HD Genotypes in Guzera (Bos indicus) Breed. In: Proceedings of the World Congress in genetics Applied to Livestock Production. Vancouver: WCGALP; 2014. p. 3908–11.

    Google Scholar 

  40. Vazquez A, Rosa G, Weigel K, De los Campos G, Gianola D, Allison D. Predictive ability of subsets of single nucleotide polymorphisms with and without parent average in US Holsteins. J Dairy Sci. 2010;93(12):5942–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. Ertl J, Edel C, Emmerling R, Pausch H, Fries R, Götz K-U. On the limited increase in validation reliability using high-density genotypes in genomic best linear unbiased prediction: observations from Fleckvieh cattle. J Dairy Sci. 2014;97(1):487–96.

    Article  CAS  PubMed  Google Scholar 

  42. Su G, Brøndum RF, Ma P, Guldbrandtsen B, Aamand GP, Lund MS. Comparison of genomic predictions using medium-density (54,000) and high-density (777,000) single nucleotide polymorphism marker panels in Nordic Holstein and Red Dairy Cattle populations. J Dairy Sci. 2012;95(8):4657–65.

    Article  CAS  PubMed  Google Scholar 

  43. Sargolzaei M, Schenkel FS, Chesnais J. Comparison between the use of true and imputed genotypes for predicting the GPA of young bulls. Dairy Cattle Breed Genet Comm Meet. 2010;1:8.

    Google Scholar 

  44. Chen L, Li C, Sargolzaei M, Schenkel F. Impact of genotype imputation on the performance of GBLUP and Bayesian methods for genomic prediction. PloS One. 2014;9(7), e101544.

    Article  PubMed  PubMed Central  Google Scholar 

  45. Calus M, Bouwman A, Hickey J, Veerkamp R, Mulder H. Evaluation of measures of correctness of genotype imputation in the context of genomic prediction: a review of livestock applications. Animal. 2014;8(11):1743–53.

    Article  CAS  PubMed  Google Scholar 

  46. Misztal I, Lawlor T, Gengler N. Relationships among estimates of inbreeding depression, dominance and additive variance for linear traits in Holsteins. Genet Sel Evol. 1997;29(3):1.

    Article  Google Scholar 

  47. Gaddis KLP, Tiezzi F, Cole JB, Clay JS, Maltecca C. Genomic prediction of disease occurrence using producer-recorded health data: a comparison of methods. Genet Sel Evol. 2015;47(1):1.

    Article  Google Scholar 

  48. Saatchi M, Schnabel RD, Rolf MM, Taylor JF, Garrick DJ. Accuracy of direct genomic breeding values for nationally evaluated traits in US Limousin and Simmental beef cattle. Genet Sel Evol. 2012;44(1):1.

    Article  Google Scholar 

  49. Madsen P, Sørensen P, Su G, Damgaard LH, Thomsen H, Labouriau R. DMU-a package for analyzing multivariate mixed models. In: Conference Proceedings of the 8th World Congress on Genetics Applied to Livestock Production (WCGALP). Brazil: Belo Horizonte; 2006.

Download references

Acknowledgments

The authors thank the following organizations for providing data and collaborating within the project: Conexão Delta G’s Genetic Improvement Program; Paint Genetic Improvement Program; and Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES) that provided graduate fellowship for the first author.

Funding

This research was partially supported by CNPq - National Council for Scientific and Technological Development grant 478992/2012-2 and Embrapa - Brazilian Agricultural Research Corporation grants 02.09.07.004 and 01.11.07.002.

Availability of data and materials

All relevant information supporting the results not already presented in the article are given in additional files. The raw data cannot be made available, as it is property of the Braford and Hereford producers in Brazil and this information is commercially sensitive.

Authors’ contributions

MLP carried out the investigation, interpreted the results and prepared the manuscript, LFB helped in the synthesis of the results and the preparation of the manuscript, JB was involved in the interpretation and discussion of the results and the preparation of the manuscript, FFC was involved in the interpretation and discussion of the results and the preparation of the manuscript, MS provided advice throughout the data analysis process, and FSS helped to design the study and provided advice on all steps. All authors read and approved the final manuscript.

Competing interests

The authors declare that there are no competing interests.

Consent for publication

Not applicable.

Ethics approval

Animal welfare and use committee approval was not needed for this study as datasets were obtained from pre-existing databases.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mario L. Piccoli.

Additional files

Additional file 1: Table S1.

Losses in expected GEBV accuracy using the 8 K and 15 K SNP panel imputed to the 50 K SNP panel compared to the real 50 K SNP panel in the SCE1 scenarios. (DOC 61 kb)

Additional file 2: Table S2.

Losses in expected GEBV accuracy using the 8 K and 15 K SNP panel imputed to the 50 K SNP panel in the SCE2 scenarios and the 777 K SNP panel imputed from the 50 K SNP panel in the SCE3 scenario compared to the true 50 K SNP panel. (DOC 63 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Piccoli, M.L., Brito, L.F., Braccini, J. et al. Genomic predictions for economically important traits in Brazilian Braford and Hereford beef cattle using true and imputed genotypes. BMC Genet 18, 2 (2017). https://doi.org/10.1186/s12863-017-0475-9

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s12863-017-0475-9

Keywords