 Research article
 Open Access
 Published:
Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds
BMC Genetics volume 16, Article number: 88 (2015)
Abstract
Background
Levels of inbreeding in cattle populations have increased in the past due to the use of a limited number of bulls for artificial insemination. High levels of inbreeding lead to reduced genetic diversity and inbreeding depression. Various estimators based on different sources, e.g., pedigree or genomic data, have been used to estimate inbreeding coefficients in cattle populations. However, the comparative advantage of using full sequence data to assess inbreeding is unknown. We used pedigree and genomic data at different densities from 50k to full sequence variants to compare how different methods performed for the estimation of inbreeding levels in three different cattle breeds.
Results
Five different estimates for inbreeding were calculated and compared in this study: pedigree based inbreeding coefficient (F_{PED}); run of homozygosity (ROH)based inbreeding coefficients (F_{ROH}); genomic relationship matrix (GRM)based inbreeding coefficients (F_{GRM}); inbreeding coefficients based on excess of homozygosity (F_{HOM}) and correlation of uniting gametes (F_{UNI}). Estimates using ROH provided the direct estimated levels of autozygosity in the current populations and are free effects of allele frequencies and incomplete pedigrees which may increase in inaccuracy in estimation of inbreeding. The highest correlations were observed between F_{ROH} estimated from the full sequence variants and the F_{ROH} estimated from 50k SNP (single nucleotide polymorphism) genotypes. The estimator based on the correlation between uniting gametes (F_{UNI}) using full genome sequences was also strongly correlated with F_{ROH} detected from sequence data.
Conclusions
Estimates based on ROH directly reflected levels of homozygosity and were not influenced by allele frequencies, unlike the three other estimates evaluated (F_{GRM}, F_{HOM} and F_{UNI}), which depended on estimated allele frequencies. F_{PED} suffered from limited pedigree depth. Marker density affects ROH estimation. Detecting ROH based on 50k chip data was observed to give estimates similar to ROH from sequence data. In the absence of full sequence data ROH based on 50k can be used to access homozygosity levels in individuals. However, genotypes denser than 50k are required to accurately detect short ROH that are most likely identical by descent (IBD).
Background
The definition of inbreeding coefficient (F) is the probability that two alleles in an individual are identical by descent (IBD) relative to a base population where all alleles are assumed unrelated [1]. Rates of inbreeding have increased as intensive selection was applied to the populations [2–7]. Increased levels of inbreeding result in increased probability that animals are homozygous for deleterious alleles [2, 8, 9]. Thus, inbred animals suffer from inbreeding depression with reduced fitness, and highly inbred animals may have considerably reduced lifespans [2, 6, 10–13]. Information on inbreeding is critical in the design of breeding program to control the increase in inbreeding levels and thereby controlling inbreeding depression in the progeny. Pedigree information has been used to calculate the estimated inbreeding coefficient as the expected probability that two alleles at a locus are IBD [14–16]. For example, Meuwissen and Luo proposed a method to estimate inbreeding coefficients based on pedigree data of large populations [17]. However, incomplete pedigrees result in erroneous estimates and an underestimation of levels of inbreeding [18]. VanRaden proposed a method to take into account unknown ancestors when estimating inbreeding coefficients, increasing the accuracy of inbreeding level estimates in incomplete pedigrees [19].
With the availability of Single Nucleotide Polymorphism (SNP) array genotyping technologies, long stretches of homozygous genotypes, known as runs of homozygosity (ROH) can be identified. ROH are believed to reflect an estimate of autozygosity on genomic level and generally identify genomic regions which are IBD [20, 21]. Theoretically, it is expected that ROH can be accurately estimated from the full sequence data, because these estimates do not suffer from sampling such as may be expected when subsets of loci, for instance 50k SNPs, are used [22–24]. The inbreeding coefficient can be calculated as the proportion of genome covered by ROH and has been shown to be more informative than the inbreeding coefficient estimated from pedigree data or other estimators because ROH strongly correlate with homozygous mutation load [25]. ROH have commonly been used to infer population history and to examine the effect of deleterious homozygotes caused by inbreeding in human populations [20, 26–29]. Long ROH reflect recent inbreeding, whereas short ROH reflect ancient inbreeding [26]. However, only a few studies have evaluated ROH in cattle populations. Ferenčaković et al. examined the effect of SNP density and genotyping errors when estimating autozygosity from highthroughput genomic data [24]. Estimates based on ROH also vary with different densities of genomic data. The minimum length of ROH that can be detected depends on SNP density [24, 30]. Recently, Purfield et al. detected ROH in a cattle population from SNP chip data to infer population history [31]. However, to estimate the “true” state of ROH, wholegenome sequences should be used rather than SNP chip data, but, to date, there are only few studies doing this in cattle [32]. With the advent of nextgeneration sequencing technology, wholegenome sequences have become available to examine the finescale genetic architecture of the cattle genome. It is now possible to investigate and compare how well different commonly used estimators of inbreeding level correlate with ROH estimated using nextgeneration sequence (NGS) data.
In recent years, widespread availability of genotype data enabled computation of inbreeding from the diagonals of genomic relationship matrices, i.e., the “GRM” method (F_{GRM}), as a byproduct of genomic selection. Similarly, using the genotypes, the inbreeding coefficient can be computed based on excess of homozygosity following Wright (1948) (F_{HOM}) [33] and based on correlation between uniting gametes following Wright (1922) (F_{UNI}) [1]. The objective of the present study was to compare different estimators for inbreeding coefficients calculated from pedigree, 50k SNP chip genotypes and full sequence data with estimates based on ROH, for three different dairy cattle breeds.
Methods
SNP genotyping and sequencing
A total of 89 bulls with a high genetic contribution to current Danish dairy cattle populations were selected for wholegenome resequencing. These included 32 Holstein (HOL), 27 Jersey (JER), and 30 Danish Red Cattle (RDC) bulls. RDC cattle are a composite breed with contributions from different red breeds, including Swedish Red, Finnish Ayrshire, and Brown Swiss [34]. Only biallelic variants SNPs with a phredscaled quality score [35] higher than 100 were kept for analysis to ensure the quality of variants. Genotypes were extracted from wholegenome sequence (WGS) data using GATK [36] and a perl script. The sequence variants with read depth lower than 7 or higher than 30 were filtered out. In addition, 85 of the sequenced animals were genotyped with the Illumina 50k SNP assay (BovineSNP50 BeadChip version 1 or 2, Illumina, San Diego, CA). SNP genotyping and quality control were as described by Höglund et al. [37]. Among the whole genome sequenced animals, 4 animals were not genotyped with the 50k SNP chip. Their genotypes for the SNPs on the 50k chip were extracted from their wholegenome sequences. The quality of genotype calls from SNP chips is expected to be higher than that of wholegenome sequences; therefore, only sequence variants with a high quality score (phred score > 100) were included. The corresponding corrections for reverse strand calls in the sequence data were converted to Illumina calls by correcting locus calling from reverse strands in Illumina calls to maintain consistency of allele encoding between Illumina calls and sequence data. The concordance between the SNP chip and sequence data was ~97 %.
Estimation of inbreeding
Using pedigree records (F_{PED})
Inbreeding coefficients for the 89 bulls were estimated using pedigree records (F_{PED}). The average pedigree depth was ~8 generations ranging from 3 to 13. Average pedigree depth was 7, 8 and 9 for HOL, JER, and RDC, respectively. The method proposed by VanRaden [19] was used to compute inbreeding coefficients, which replaces unknown inbreeding coefficients by average inbreeding coefficients in the same generations. Inbreeding coefficients were calculated using the following formula [38]:
where \( {A}_{ii} \) is the i ^{th} diagonal element of the A matrix (pedigree relationship matrix), which is equal to the inbreeding coefficient of the i ^{th} animal plus 1. L is a lower triangular matrix containing the fraction of the genes that animals derive from their ancestors, and D is a diagonal matrix containing the within family additive genetic variances of animals [17]. The computation for matrix elements L _{ ij } and D _{ jj } follows the rule of computation of the A matrix [17]. The detailed decomposition for computing \( {A}_{ii} \) is explained by Meuwissen and Luo [17]. The analysis was conducted using Relax2 software [39].
Using genotypes (F_{ROH}, F_{GRM}, F_{HOM}, F_{UNI})
Sequence data
ROH were detected from sequence data using all biallelic variants according to the method of Bosse et al. [23]. This method was used to compute ROH for sequence data instead of PLINK because not all short ROH can be detected using PLINK for sequence data (the sliding window size in PLINK is fixed; therefore, ROH shorter than a certain length cannot be detected). The measure of homozygosity based on ROH (F_{ROH}) from genomic data is defined as the total length of genome covered by ROH divided by the overall length of genome covered by SNPs or sequences as follows [20]:
where L_{ROH} is the sum of ROH lengths and L_{AUTO} is the total length of autosomes covered by reads. The inbreeding coefficient was calculated by extracting ROH from sequence data. Three ROH estimates based on lengths were calculated from sequence data. The ROH was calculated separately by summing the ROH in different length classes: 1) based on all ROH; 2) ROH >1 Mbp; 3) ROH >3 Mbp.
In addition, three other estimates of inbreeding coefficients were calculated using sequence data (F_{GRM}, F_{HOM}, F_{UNI}). The F_{GRM} estimate was calculated following VanRaden (2008) [40] based on the variance of the additive genotypes. F_{GRM} was derived from
where p _{ i } is the observed fraction of the first allele at locus i, h _{ i } = 2p _{ i }(1 − p _{ i }) and x _{ i } is the number of copies of the reference allele (i.e., the allele whose homozygous genotype was coded as “0”) for the i^{th} SNP [41]. This was equivalent to estimating an individual’s relationship to itself (diagonal of the SNPderived GRM). The F_{HOM} estimate was calculated based on the excess of homozygosity following Wright (1948) [33]:
where O (# hom) and E (# hom) are the observed and expected numbers of homozygous genotypes in the sample, respectively [41]. The F_{UNI} estimate was calculated based on the correlation between uniting gametes following Wright (1922) [1]:
where h _{ i } and x _{ i } are the same as for F_{GRM} [41]. The calculations for these three estimates F_{GRM}, F_{HOM} and F_{UNI} were computed using the option –ibc from GCTA software [41].
50k SNP chip
ROH were detected from 50k SNP chip data using the software PLINK with adjusted parameters (–homozygdensity 1000, –homozygwindowhet 1, –homozygkb 10, –homozygwindowsnp 20) [23, 42]. These settings for PLINK to detect ROH in SNP data were chosen to make the detected ROH in SNP chip data and sequence data as similar as possible to enable comparisons of results when using different types of data. Genomic estimates of the inbreeding coefficient based on all ROH (F_{ROH}) were calculated using the same formula as was used for the sequence data. The other three types of estimates (F_{GRM}, F_{HOM}, F_{UNI}) were also calculated for genotypes extracted from 50k SNP chip data using the same methods as for sequence data.
Pearson’s correlation coefficients were calculated between estimates of inbreeding coefficients from each of pedigree records, 50k SNP genotypes, and wholegenome sequence variants. All correlations between different inbreeding coefficient estimators were tested within breed to determine whether they were significantly different from 0 using the R (http://www.rproject.org/) cor and cor.test functions.
Impact of allele frequencies on estimators of inbreeding
As some estimators explicitly use allele frequencies to compute inbreeding coefficients, it is important to investigate how varying allele frequencies affect estimated inbreeding coefficients. Here, we investigated how the three different estimators change across the whole range of allele frequencies. For each genotype x _{ i } (homozygous for the reference allele; heterozygous for the reference and nonreference allele; homozygous for the nonreference allele), the values can be written as a function of allele frequency p _{ i }, as shown in Table 1.
Results
We used five different approaches (F_{PED}, F_{GRM}, F_{HOM}, F_{UNI}, F_{ROH}) to estimate inbreeding coefficients using information from three different sources: pedigree, whole genome sequence and 50k SNP chip genotype data. There were total 11 estimates of inbreeding coefficients for each animal (Table 2). The average inbreeding coefficients estimated using different approaches and different data sets are presented in Table 2. The F_{PED} and F_{ROH} estimated from 50k data for HOL and JER are significantly higher than for RDC (p < 0.05). For inbreeding coefficients estimated from sequence data, F_{ROH}, F_{ROH>1Mb}, F_{ROH>3Mb}, F_{HOM} and F_{UNI} differed significantly among breeds, being highest in JER and lowest in RDC. The mean F_{ROH} for 50k SNP chip data (0.066), and sequence data (0.19) are significantly higher than F_{PED} (0.016) (p < 0.01).
F_{ROH} estimated from sequence data is a direct and accurate estimate of the levels of homozygosity. It mostly reflects regions which were IBD on the genome; therefore, we limited our comparisons to comparing between F_{ROH} from sequence data with other estimates of F. High correlations were observed between F_{ROH} estimated from the 50k and sequence data with F_{ROH>1Mb} and F_{ROH>3Mb} from the sequence data for all three breeds (Tables 3, 4 and 5). The correlation between F_{ROH} estimated from 50k data and F_{ROH>3Mb} was higher than F_{ROH} estimated from 50k data and F_{ROH>1Mb} in JER and RDC (Tables 4 and 5). F_{ROH} was consistently positively correlated with F_{HOM} and F_{UNI}, when both were computed from either 50k or sequence data in all three breeds (Tables 3, 4 and 5). A high correlation was found between F_{ROH} and F_{UNI}, when both were computed from either 50k or sequence data in all three breeds (Tables 3, 4 and 5). However, for different breeds, F_{HOM} and F_{UNI} were correlated differently across different densities of genomic data. For HOL and RDC, the higher the density of genomic data used for F_{UNI}, the higher the correlation was between F_{UNI} and F_{ROH} from sequence data (Tables 3 and 5). For HOL, the correlation between F_{UNI} and F_{ROH} from sequence data (0.95) was still higher than the correlation between F_{ROH} estimated from 50k SNP chip data and sequence data (0.87) (Table 3). In contrast to JER, F_{HOM} and F_{UNI} were most highly correlated with F_{ROH} estimated from sequence data (Table 5).
F_{PED} was mostly intermediately correlated with F_{HOM} and F_{ROH} estimated from 50k and sequence data. The highest correlation between F_{PED} and F_{ROH} estimated from 50k and sequence data was found in HOL (Table 3). The strongest correlation among estimators of F_{ROH} (F_{ROH} from 50k or sequence data or F_{ROH>3Mb} or F_{ROH>1Mb} from sequence data) and F_{PED} was observed between F_{PED} and F_{ROH>3Mb} from sequence data in HOL (Table 3). A moderate correlation was found between F_{PED} and F_{ROH} estimated from 50k and sequence data for JER and RDC (Tables 4 and 5).
The estimate F_{GRM} from both 50k and sequence data and F_{PED} had a correlation close to zero in all three breeds and the values were often negative (Tables 3, 4 and 5). At the same time, F_{GRM} estimated from 50k and sequence data generally showed a low correlation with other estimates except between two estimates F_{GRM} estimated from 50k and sequence data in HOL and JER, and between F_{GRM} and F_{UNI} estimated from 50k data (Tables 3 and 4).
Discussion
Pedigree has been used to estimate inbreeding coefficients in animal breeding for over 50 years [1, 17]. Recently, researchers have utilized runs of homozygosity (ROH) estimated from medium density genotype data such as 50k SNP chip data to estimate inbreeding coefficients in livestock populations [22–24, 30]. ROH were initially used to explore regions of inbreeding in the genome and further investigate the fitness effect of these regions on different traits [2, 9, 11, 43]. Population subdivision and either inbreeding or inbreeding avoidance affects the whole genome composition, whereas selection and assortative mating will affect only those loci associated with particular phenotypes. However, we observed that inbreeding coefficient F_{ROH} estimated from sequence data were relatively higher for chromosome 1 and 10 for all four breeds (Fig. 1). This is most likely because the local recombination rate is relatively lower than average, which results in high levels of homozygosity on average [23, 44].
Our study is the first to calculate inbreeding coefficient based on ROH from full sequence data in cattle. The objective of this study was to compare estimates of inbreeding calculated from different methods and different data sources (pedigree, 50k SNP chip genotypes and full sequence data).
The pedigreebased inbreeding coefficient, F_{PED}, was moderately correlated with F_{HOM} and F_{ROH} in all breeds. These moderate correlations (~0.47 to 0.56) may be partly explained by the relatively shallow depth of the pedigree records (~8–9) for these bulls. Another difference between F_{ROH} and F_{PED} is that short ROH capture ancient inbreeding while long ROH capture recent inbreeding whereas pedigree captures only relatively recent inbreeding. Pedigree accounts only for inbreeding that occurred since pedigree recording began. Therefore, after excluding ROH smaller than 1 or 3 Mbp, the correlation between F_{PED} and F_{ROH} from sequence data increased slightly for all breeds. We should also point out that a very long stretch of homozygosity using marker data might not actually be completely homozygous and therefore, higher density data was suggested to be used to detect selective sweeps through runs of homozygosity [45]. Sørensen et al. [7] has estimated inbreeding in Danish Dairy Cattle Breeds and our estimates F_{PED} are lower than theirs. This is because our sampled animals for sequencing are founder and older animals compare to the other study where they used all animals [7].
Estimates of inbreeding coefficients differed with methods. Inbreeding coefficients estimates from methods using allele frequencies, i.e., F_{GRM}, F_{HOM} and F_{UNI}, showed considerable variation across data type and breeds. These estimators were sensitive to allele frequencies compared to ROH estimators, especially for populations with divergent allele frequencies (e.g., Fig. 2; RDC population). The estimates of genomic inbreeding coefficients are dependent on the allele frequencies in the base population [40].
In order to explore the reasons about the various correlations between inbreeding coefficients estimates using allele frequencies, F_{GRM}, F_{HOM} and F_{UNI} were plotted against the allele frequency changing from 0 to 1 when the number of copies of reference alleles for i^{th} SNP is 0, 1 or 2 (Figs. 3, 4 and 5). When a locus is homozygous for either the reference alleles or the nonreference alleles with the allele frequency ranging from 0 to 1, F_{GRM} ranged from 1 to infinity, F_{HOM} has a constant value of 1 and F_{UNI} ranged from 0 to infinity (Figs. 3 and 5). F_{HOM} gave constant estimates for homozygous genotypes, regardless of the allele frequency (Figs. 3 and 5). When the allele frequency of the nonreference alleles is smaller than 0.2 or larger than 0.8, F_{GRM} was less than 0 (Figs. 3 and 5). When the allele frequency of the nonreference allele was between 0.2 and 0.5 or when the allele frequency of the reference allele was between 0.5 and 0.8, F_{GRM} become positive and ranges from 0 to 1 (Figs. 3 and 5).
For a heterozygous locus with an allele frequency ranging from 0 to 1, F_{GRM} and F_{HOM} ranged from minus infinity to plus infinity, and F_{UNI} has a constant value of 0 (Fig. 4). If the allele frequency was smaller than 0.2 or larger than 0.8 F_{GRM} become very large positive whereas F_{HOM} become a large negative. F_{HOM} was always negative, and F_{GRM} was always positive (Fig. 4). Thus, when a population has a high level of heterozygosity and some rare alleles with small frequency, F_{GRM} would yield large positive inbreeding coefficients, which can be misleading. This result explains why F_{GRM} was positive in the RDC breed (Table 2): this population had a higher level of heterozygosity than HOL and JER. F_{UNI} gave a stable value of 0 when the locus was heterozygous and therefore was robust to allele frequency (Fig. 4).
The correlation between the three estimators F_{GRM}, F_{HOM} and F_{UNI} was computed for each of the three genotypes (i.e., homozygotes for allele 1, homozygotes for allele 2 and heterozygotes) for comparison between F_{GRM}, F_{HOM} and F_{UNI} when the allele frequency was varied between 0 and 1 (Fig. 6). Correlations reached the maximal value (i.e., 1) when the allele frequencies were 0.5. When the allele frequencies were extremely high or low, correlations between estimators became low, especially the correlation between F_{GRM} and F_{HOM} (0.27). The correlation plot (Fig. 6) reflected a similar result as those in Figs. 3, 4 and 5. Therefore, when computing inbreeding coefficients using allele frequencies, populations with different allele frequencies might have very different inbreeding coefficients and the correlations between those inbreeding coefficients might be very low, with different allele frequencies.
The comparison between F_{GRM} and other estimators showed a very low correlation and F_{GRM} was mostly negatively correlated with other estimators. F_{HOM} based on excess of homozygosity was positively correlated with other estimators and was relatively highly correlated with F_{ROH} detected from 50k and sequence data. F_{UNI} based on correlations between uniting gametes estimated from 50k data generally was negatively correlated with other estimators. However, with increasing marker density, the correlation between F_{UNI} and other estimators became positive for the HOL and RDC populations. Surprisingly, when using sequence data, F_{UNI} was highly correlated with other estimators, especially F_{ROH}, detected from sequence data (~0.95) for HOL. This correlation may have resulted from the nature of the estimators: F_{ROH} uses only runs of homozygosity, whereas the other estimators (to some extent) capture all of the homozygosity. This high correlation for F_{UNI} and F_{ROH} compared with low correlation between F_{GRM} and F_{ROH} might also be explained by the algorithms: F_{GRM} = (1 + F)1 and F is the correlation between uniting gametes. This estimator has only sampling on the Fterm, whereas in the F_{GRM} estimator there is also sampling variance on the “1”, which creates additional sampling variance.
It is known that RDC is an admixed breed with introgressed haplotypes from Old Danish Red, Holstein and Brown Swiss breeds. HOL and JER are relatively pure breeds and more inbred than RDC (Zhang Q, Guldbrandtsen B, Bosse M, Lund MS, Sahana G. Runs of homozygosity and distribution of functional variants in the cattle genome. BMC Genomics (in press)). Therefore, minor allele frequencies tend to be lower in HOL and JER breeds than in RDC. F_{GRM} is negatively correlated with other estimators for all three breeds. F_{HOM} becomes negative for RDC, which is likely due to the admixture present in RDC. Therefore, it appears that F_{GRM} tends to be less accurate for populations with a low minor allele frequency and that F_{HOM} tends to be less accurate for populations with a higher level of heterozygosity. This argument is supported by our results that the three inbreeding estimators F_{GRM}, F_{HOM} and F_{UNI} were most closely correlated with each other when the allele frequency is approximately 0.5 (Figs. 3, 4 and 5). Therefore, the three estimators F_{GRM}, F_{HOM} and F_{UNI} depend strongly on the estimation of allele frequencies in the population, unlike F_{ROH}. However, here we only took one locus as an example to study the impact of allele frequencies on three estimators F_{GRM}, F_{HOM} and F_{UNI}.
Conclusion
In this study, we compared different estimators of inbreeding coefficient with different types of data (pedigree, 50k SNP chip genotypes and full sequence data). Methods based on GRM, excess of homozygosity and the correlation between uniting gametes were observed to be sensitive to allele frequencies in the base population. The estimator based on pedigree data was moderately correlated with estimators based on ROH when a pedigree is relatively complete. Estimators based on ROH from SNP chip genotypes and full sequence directly reflect homozygosity on the genome, and have the advantage of not being affected by estimates of allele frequency or incompleteness of the pedigree. Inbreeding estimated from ROH was shown to be affected by the marker density used. Using sequence data, we obtained a full picture of the distribution of ROH on the genome, including short and medium length ROH that reflect ancient inbreeding regions which are possibly IBD. Detecting ROH based on highdensity or 50k chip data was shown to give estimates most closely related to ROH from sequence data. However, more than 50k genotypes are required to accurately detect short ROH that are most likely identical by descent (IBD).
Availability of supporting data
Data used in this study are from the 1000 Bull Genome Project (Daetwyler et al. 2014 Nature Genet. 46:858–865). Whole genome sequence data of individual bulls of the 1000 Bull Genomes Project are already available at NCBI using SRA no. SRP039339 (http://www.ncbi.nlm.nih.gov/bioproject/PRJNA238491).
Abbreviations
 F:

Inbreeding coefficient
 F_{PED} :

Pedigree based inbreeding coefficient
 ROH:

Run of homozygosity
 F_{ROH} :

Runs of homozygositybased inbreeding coefficients
 GRM:

Genomic relationship matrix
 F_{GRM} :

Genomic relationship matrixbased inbreeding coefficients
 F_{HOM} :

Inbreeding coefficients based on excess of homozygosity
 F_{UNI} :

Inbreeding coefficients based on correlation of uniting gametes
 IBD:

Identity by descent
 SNP:

Single nucleotide polymorphism
 NGS:

Nextgeneration sequence
 HOL:

Holstein
 JER:

Jersey
 RDC:

Danish Red Cattle
 WGS:

Wholegenome sequence
References
 1.
Wright S. Coefficients of inbreeding and relationship. Am Nat. 1922;56:330–8.
 2.
GonzalezRecio O, de Maturana EL, Gutierrez JP. Inbreeding depression on female fertility and calving ease in Spanish dairy cattle. J Dairy Sci. 2007;90(12):5744–52.
 3.
Margolin S, Bartlett JW. The influence of inbreeding upon the weight and size of dairy cattle. J Anim Sci. 1945;4(1):3–12.
 4.
Miglior F, Szkotnicki B, Burnside EB. Analysis of levels of inbreeding and inbreeding depression in Jersey cattle. J Dairy Sci. 1992;75(4):1112–8.
 5.
Nomura T, Honda T, Mukai F. Inbreeding and effective population size of Japanese black cattle. J Anim Sci. 2001;79(2):366–70.
 6.
Smith LA, Cassell BG, Pearson RE. The effects of inbreeding on the lifetime performance of dairy cattle. J Dairy Sci. 1998;81(10):2729–37.
 7.
Sorensen AC, Sorensen MK, Berg P. Inbreeding in Danish dairy cattle breeds. J Dairy Sci. 2005;88(5):1865–72.
 8.
Szpiech ZA, Xu JS, Pemberton TJ, Peng WP, Zollner S, Rosenberg NA, et al. Long runs of homozygosity are enriched for deleterious variation. Am J Hum Genet. 2013;93(1):90–102.
 9.
Bjelland DW, Weigel KA, Vukasinovic N, Nkrumah JD. Evaluation of inbreeding depression in Holstein cattle using wholegenome SNP markers and alternative measures of genomic inbreeding. J Dairy Sci. 2013;96(7):4697–706.
 10.
Leroy G. Inbreeding depression in livestock species: review and metaanalysis. Anim Genet. 2014;45(5):618–28.
 11.
Charlesworth D, Charlesworth B. Inbreeding depression and its evolutionary consequences. Annu Rev Ecol Syst. 1987;18:237–68.
 12.
Wright S. Systems of mating. II. The effects of inbreeding on the genetic composition of a population. Genetics. 1921;6(2):124–43.
 13.
Pusey A, Wolf M. Inbreeding avoidance in animals. Trends Ecol Evol. 1996;11(5):201–6.
 14.
Weigel K. Controlling inbreeding in modern dairy breeding programs. Adv Dairy Technol. 2006;18:263–74.
 15.
Mc Parland S, Kearney JF, Rath M, Berry DP. Inbreeding trends and pedigree analysis of Irish dairy and beef cattle populations. J Anim Sci. 2007;85(2):322–31.
 16.
Blackwell BF, Doerr PD, Reed JM, Walter JR. Inbreeding rate and effective populationsize  a comparison of estimates from pedigree analysis and a demographicmodel (Vol 71, Pg 299, 1995). Biol Conserv. 1995;72(3):407.
 17.
Meuwissen THE, Luo Z. Computing inbreeding coefficients in large populations. Genet Sel Evol. 1992;24(4):305–13.
 18.
Cassell BG, Adamec V, Pearson RE. Effect of incomplete pedigrees on estimates of inbreeding and inbreeding depression for days to first service and summit milk yield in Holsteins and Jerseys. J Dairy Sci. 2003;86(9):2967–76.
 19.
Vanraden PM. Accounting for inbreeding and crossbreeding in genetic evaluation of large populations. J Dairy Sci. 1992;75(11):3136–44.
 20.
McQuillan R, Leutenegger AL, AbdelRahman R, Franklin CS, Pericic M, BaracLauc L, et al. Runs of homozygosity in European populations. Am J Hum Genet. 2008;83(3):359–72.
 21.
Broman KW, Weber JL. Long homozygous chromosomal segments in reference families from the Centre d’Etude du polymorphisme humain. Am J Hum Genet. 1999;65(6):1493–500.
 22.
Marras G, Gaspa G, Sorbolini S, Dimauro C, AjmoneMarsan P, Valentini A, et al: Analysis of runs of homozygosity and their relationship with inbreeding in five cattle breeds farmed in Italy. Anim Genet. 2015;46(2):110121.
 23.
Bosse M, Megens HJ, Madsen O, Paudel Y, Frantz LA, Schook LB, et al. Regions of homozygosity in the porcine genome: consequence of demography and the recombination landscape. PLoS Genet. 2012;8(11):e1003100.
 24.
Ferencakovic M, Solkner J, Curik I. Estimating autozygosity from highthroughput information: effects of SNP density and genotyping errors. Genet Sel Evol. 2013;45:42.
 25.
Keller MC, Visscher PM, Goddard ME. Quantification of inbreeding due to distant ancestors and its detection using dense single nucleotide polymorphism data (vol 189, pg 237, 2011). Genetics. 2012;190(1):283.
 26.
Kirin M, McQuillan R, Franklin CS, Campbell H, McKeigue PM, Wilson JF. Genomic runs of homozygosity record population history and consanguinity. Plos One. 2010;5(11):e13996.
 27.
Ku CS, Naidoo N, Teo SM, Pawitan Y. Regions of homozygosity and their impact on complex diseases and traits. Hum Genet. 2011;129(1):1–15.
 28.
Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, et al. Worldwide human relationships inferred from genomewide patterns of variation. Science. 2008;319(5866):1100–4.
 29.
Charlesworth B, Morgan MT, Charlesworth D. The effect of deleterious mutations on neutral molecular variation. Genetics. 1993;134(4):1289–303.
 30.
Ferencakovic M, Hamzic E, Gredler B, Solberg TR, Klemetsdal G, Curik I, et al. Estimates of autozygosity derived from runs of homozygosity: empirical evidence from selected cattle populations. J Anim Breed Genet. 2013;130(4):286–93.
 31.
Purfield DC, Berry DP, McParland S, Bradley DG. Runs of homozygosity and population history in cattle. BMC Genet. 2012;13:70.
 32.
MacLeod IM, Larkin DM, Lewin HA, Hayes BJ, Goddard ME. Inferring demography from runs of homozygosity in wholegenome sequence, with correction for sequence errors. Mol Biol Evol. 2013;30(9):2209–23.
 33.
Wright S. Genetics of populations. Encyclopaedia Britannica. 1948;10:111AD112.
 34.
Andersen B, Jensen B, Nielsen A, Christensen LG, Liboriussen T. Rød Dansk Malkeraceavlsmæssigt of kulturhistorisk belyst. Denmark: Danmarks HordbrugsForskning; 2003.
 35.
Ewing B, Hillier L, Wendl MC, Green P. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8(3):175–85.
 36.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, et al. The genome analysis toolkit: a mapreduce framework for analyzing nextgeneration DNA sequencing data. Genome Res. 2010;20:1297–303.
 37.
Hoglund JK, Sahana G, Guldbrandtsen B, Lund MS. Validation of associations for female fertility traits in Nordic Holstein, Nordic Red and Jersey dairy cattle. BMC Genet. 2014;15:8.
 38.
Quaas RL. Computing the diagonal elements and inverse of a large numerator relationship matrix. Biometrics. 1976;32:949–953.
 39.
Strandén I, Vuori K: Relax2: pedigree analyses program. Proceedings of the 8th WCGALP. Belo Horizonte, MG, Brazil: Instituto Prociência; 2006.
 40.
VanRaden PM. Efficient methods to compute genomic predictions. J Dairy Sci. 2008;91(11):4414–23.
 41.
Yang JA, Lee SH, Goddard ME, Visscher PM. GCTA: a tool for genomewide complex trait analysis. Am J Hum Genet. 2011;88(1):76–82.
 42.
Purcell S, Neale B, ToddBrown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for wholegenome association and populationbased linkage analyses. Am J Hum Genet. 2007;81(3):559–75.
 43.
Pryce JE, HaileMariam M, Goddard ME, Hayes BJ. Identification of genomic regions associated with inbreeding depression in Holstein and Jersey dairy cattle. Genet Sel Evol. 2014;46(1):71.
 44.
Arias JA, Keehan M, Fisher P, Coppieters W, Spelman R. A high density linkage map of the bovine genome. BMC Genet. 2009;10:18.
 45.
Ramey HR, Decker JE, McKay SD, Rolf MM, Schnabel RD, Taylor JF. Detection of selective sweeps in cattle using genomewide SNP data. BMC Genomics. 2013;14:382.
Acknowledgement
Q. Zhang benefited from a joint grant from the European Commission within the framework of the ErasmusMundus joint doctorate “EGSABG”. This research was supported by the Center for Genomic Selection in Animals and Plants (GenSAP) funded by The Danish Council for Strategic Research.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
QZ developed and planned the study design, coordinated the study, recruited the participants, performed data analyses and drafted the manuscript. MC participated in the study design, analyses of data, and drafting the manuscript. BG participated in the study design, analyses of data, and drafting the manuscript. MSL participated in study design and drafting the manuscript. GS participated in the study design, analyses of data, and drafting the manuscript. All authors read and approved the final manuscript.
Rights and permissions
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Zhang, Q., Calus, M.P., Guldbrandtsen, B. et al. Estimation of inbreeding using pedigree, 50k SNP chip genotypes and full sequence data in three cattle breeds. BMC Genet 16, 88 (2015) doi:10.1186/s1286301502277
Received
Accepted
Published
DOI
Keywords
 Inbreeding
 Cattle
 Wholegenome sequence
 Runs of homozygosity