Development of a SNP panel dedicated to parentage assignment in French sheep populations

Background The efficiency of breeding programs partly relies on the accuracy of the estimated breeding values which decreases when pedigrees are incomplete. Two reproduction techniques are mainly used by sheep breeders to identify the sires of lambs: animal insemination and natural matings with a single ram per group of ewes. Both methods have major drawbacks, notably time-consuming tasks for breeders, and are thus used at varying levels in breeding programs. As a consequence, the percentage of known sires can be very low in some breeds and results in less accurate estimated breeding values. Results In order to address this issue and offer an alternative strategy for obtaining parentage information, we designed a set of 249 SNPs for parentage assignment in French sheep breeds and tested its efficiency in one breed. The set was derived from the 54 K SNP chip that was used to genotype the thirty main French sheep populations. Only SNPs in Hardy-Weinberg equilibrium, displaying the highest Minor Allele Frequency across all the thirty populations and not associated with Mendelian errors in verified family trios were selected. The panel of 249 SNPs was successfully used in an on-farm test in the BMC breed and resulted in more than 95% of lambs being assigned to a unique sire. Conclusion In this study we developed a SNP panel for assignment that achieved good results in the on-farm testing. We also raised some conditions for optimal use of this panel: at least 180 SNPs should be used and a minute preparation of the list of candidate sires. Our panel also displays high levels of MAF in the SheepHapMap breeds, particularly in the South West European breeds. Electronic supplementary material The online version of this article (doi:10.1186/s12863-017-0518-2) contains supplementary material, which is available to authorized users.


Background
Pedigree information is essential for accurate genetic evaluation. However, in French sheep populations the rate of known sires can vary widely, from a few percent in hardy breeds reared in high mountain areas up to 100% in specialized meat and dairy breeds. The lack of complete pedigrees and misidentification of sires affect the accuracy of genetic evaluation and consequently the efficiency of breeding programs [2,9,16,29]. By increasing the percentage of known sires, the genetic gain of a breeding scheme is increased [24]. To identify the sire of a lamb, matings have to be controlled using Animal Insemination (AI) or natural matings with a single ram per group of ewes. The development of AI in sheep is unequal among breeds, particularly because (i) if fresh semen is to be used as recommended [21] the geographical area in which it can be applied is limited, (ii) the cost can be prohibitive to breeders compared with the economic value of a ram, and (iii) fertility is often lower than in natural mating conditions. When AI is not used, paternity is assessed through single-sire natural matings by managing several groups of ewes. This is very timeconsuming for breeders especially for large flocks. This method is almost impossible to set up if the sheep are not confined and graze large pasture areas. For all these reasons, the number of ewes belonging to breeding program nucleus remains limited (mainly when paternity records are required by the breeding society) and the level of known paternities cannot be increased solely by improving the management of reproduction.
With the development of genotyping technologies, single nucleotide polymorphisms (SNPs) can be used to directly assign new born lambs to their true sire. In cattle, such parentage assignment has already been developed [10,12,33]. In sheep, SNPs dedicated to parentage testing have been selected in a set of international breeds [13]. At the International level, SNP parentage panels have already been set up in Australia [4], New Zealand [6] and North America [13].
In the study of Heaton et al. [13], based on the SheepHapMap design [19], only two French populations of the same breed (meat and dairy Lacaune) were included. However, in France there exist twenty-two breeding programs for twenty-one meat sheep breeds and six breeding programs for five dairy sheep breeds. Because of this large diversity of sheep breeds [20,25] and because only two of them were included in the SheepHapMap design, we developed a specific SNP panel for parentage assignment that can be used in most French breeds. In this paper, we discuss our strategy for SNP selection, the results of the first use of the panel for parentage assignment and insights into its potential applicability for other populations across the world included in the SheepHap-Map project.

Samples and genotyping
Thirty French sheep populations were sampled. These populations were selected among the 56 French breeds (http://www.racesdefrance.fr) because they register pedigrees as part of their own breeding program and are therefore most likely to be the main users of an assignment tool. Twenty-seven out of these thirty selected populations were genotyped with the Illumina Ovine Infinium® HD SNP BeadChip (603,350 callable SNPs-Illumina©). The three remaining populations had already been genotyped with the Illumina OvineSNP50 BeadChip 54 K SNP chip (54,241 SNP -Illumina ©) ( Table 1). For each population, approximately thirty of the most unrelated and most representative males of the current genetic variability existing in their breed were selected. They were all selected in central testing station, all born after 2000 in different flocks, most of them without common ancestors in previous 3 generations and with on average 40 daughters (from 2 to 450 in meat breeds and from 10 to 1200 in dairy breeds) with production records (dairy yield or prolificacy).
Additional individuals originating from experimental flocks with reliable pedigrees were used to identify markers with high rates of Mendelian transmissions errors: a Romane x Martinik Black Belly backcross [27], a Romane pedigree [11] and a Lacaune pedigree [26]. From these flocks, we used the genotypes of 413 trios "lamb-dam-sire" (413 lambs born from 245 dams and 32 rams).

Parentage SNP panel selection
Two different genotyping chips were used and only the 42,230 SNPs present on both chips were initially preselected for evaluation. Then, four selection steps were applied successively: Finally, to obtain a final subset of about 250 not redundant SNPs, we selected SNPs with a MAF greater than 0.30 in the largest number of populations per 10 Mb-window. Linkage disequilibrium (r 2 ) between pairs of SNPs of the final set was calculated with PLINK [22].

Parentage SNP panel efficiency
To assess the assignment efficiency of the SNP panel, we considered the following criteria: The exclusion probability (PE) (probability to exclude one (PE1) or two (PE2) randomly sampled parent(s) from the parentage of an individual which is truly unrelated to them) was calculated for the final panel for each of the 30 populations using the usual formulae [28]. The probability of identity (PI) (probability that two randomly selected individuals in a population have identical genotypes) calculated as: , with fre-qAA, freqAB, freqBB being the relative genotype frequencies of AA, AB and BB individuals respectively, for a biallelic SNP with alleles A and B.
These criteria highly depend on the number of SNPs. In order to compare on an equal level the efficiency of our panel to other existing SNP panels, we randomly sampled 96, 150 and 200 SNPs of the final selected subset. One thousand samplings were performed per density and for each sampling we calculated PE1, PE2, PI and the mean MAF for each of the 30 populations studied.
Testing the performance of the SNP panel for assignment (animals, genotyping technology and assignment methodology) The SNP set was tested using an on-farm design for BMC (Blanche du Massif Central) sheep. Blood samples were collected from 509 individuals: 281 lambs, 105 sires and 123 dams. The lambs were produced by monospermic AI (the semen of a single ram was used) so their sire was assumed as known based on the breeder's records. Genotyping was performed using Sequenom technology [8]. Parentage assignment of lambs was based on its parents' likelihood contributions for each marker which were obtained using an in-house script based on the method developed by Boichard et al. [5]. A likelihood ratio was calculated between the likelihood estimated for a given parent and the likelihood obtained with a virtual parent with the allelic frequencies estimated in the population. The dam was confirmed when the likelihood ratio was positive and when there were less than 10 Mendelian incompatibilities with the lamb. Lambs with unconfirmed dams were removed from the following analyses. In order to select the most likely sire, the posterior parentage probabilities were calculated for each candidate sire. The posterior parentage probabilities of each candidate sire were computed from their respective likelihoods, assuming all candidate sires were a priori equally likely to be the true sire. A candidate sire was finally assigned as true sire if (i) the likelihood ratio was positive, (ii) the posterior probability was greater than 0.99 and (iii) the number of mismatches with the offspring was lower than 10. These two steps were followed for all the 281 lambs, and the results of paternal assignment testing were used to calculate the performance of the assignment procedure (i.e. SNPs genotyping and paternal assignment method). The robustness of our panel was finally tested: four true sires were removed from the list of candidate sires and assignment results of their lambs were analyzed. The four removed sires were chosen because other related sires (their sire and/or half-sibs and/or offspring) were included in the list of candidate sires.

Comparison of the performances of the French SNP panel and other international panels in various French and international breeds
The SNPs included in the French panel were selected based on the genetic diversity of French sheep breeds. The MAFs of the panel's SNPs were estimated in the different international populations involved in the Shee-pHapMap project [19].
We also compared the MAF, PI, PE1 and PE2 obtained with two additional SNP parentage panels developed for New-Zealander [6] and North American breeds [13] in the thirty French populations. These two panels respectively include 163 (North America) and 98 (New Zealand) SNPs, with 55 SNPs in common between the two sets.

Parentage SNP selection
In the following analyses, only individuals with a genotyping callrate greater than 0.95 were retained. This reduced the final number of useful genotyped individuals to 771 (Table 1), with 15 to 30 individuals per breed.
To ensure the applicability of the final SNP panel was optimal, we considered only the 42,230 SNPs in common between the high and medium density SNP chips as candidates for inclusion in the panel. In order to reduce as much as possible the genotyping cost related to the use of the final parentage panel, while maintaining its efficiency for assignment, we decided to include 200-300 SNPs.
On average the genotype calling frequency was 0.985, and we initially selected the 32,692 SNPs (~80%) that had a frequency greater than 0.99.
For each of the 30 populations, the MAF distribution of these 32,692 SNPs was calculated ( Fig. 1). In each population at least 12,300 SNPs have a MAF > 0.30. However, MAF-based SNP selection had to be performed at the same time for the 30 populations studied. Because only 2 SNPs have a MAF > 0.30 in all the populations (Fig. 2), additional selection criteria were applied as previously described. A total of 9269 SNPs had a MAF > 0.30 in at least 20 populations (Fig. 2), but only 1929 SNPs met all the criteria.
Among these 1929 selected SNPs, 453 departed from the Hardy-Weinberg equilibrium (p < 0.01) in at least one of the thirty populations. This reduced the number of candidate SNPs to 1476. Among the 453 discarded SNPs, two were not at equilibrium in three populations, 55 were not at equilibrium in two populations and 396 were not at equilibrium in just one population. After these four selection steps, 1432 SNPs were therefore identified as good candidates for the parentage assignment panel across the thirty considered populations. The average MAF per SNP was 0.37 (ranging from 0.33 to 0.42); the lowest average MAF was observed in the BCF (Berrichon du Cher) population (0.34) and the highest in the PAS (PréAlpes du Sud) population (0.39). These 1432 SNPs were unequally distributed over the genome (Fig. 3), with 1 to 15 SNPs per 10 Mb window. A final set of 249 SNPs was obtained by selecting one SNP per 10 Mb window. On average, there was no correlation (0.0015 ± 0.0435) between the genotypes of two SNPs sampled from the panel of 249 SNPs (Fig. 4), and linkage disequilibrium was estimated at 0.0018 ± 0.0025.
The 249 selected SNPs had an average MAF of 0.39, ranging from 0.33 to 0.42 (Additional file 1: Table S1). At the population level, the lowest average MAF was obtained in the BCF population (0.360) and the highest in the MOUR (Mourerous) population (0.408) ( Table 2). The BCF population was the population with the less favorable situation in terms of MAF distribution with 192 SNPs with a MAF higher than 0.30, 42 SNPs with a MAF between 0.20 and 0.30 and 15 SNPs with a MAF between 0.10 and 0.20 (Fig. 5). The probability (PI) that two randomly selected individuals having identical genotypes with this panel of 249 SNPs was very low: it reached its lowest value in the BMC population (6.3 × 10 −100 ) and its highest value in the GRI (Grivette) population (3.91 × 10 −92 ). The exclusion probabilities of either one or the two randomly selected parent(s) (PE1 and PE2 respectively) were close to 1 in all 30 populations (Table 2).
When we randomly selected different SNP densities (96, 150 or 200SNPs) among the 249 selected ones, we observed no difference on the average MAFs. However, PE1 and PE2 increased with the number of SNPs whereas PI decreased at higher panel densities (Additional file 2: Table S2).

Testing parentage assignment with the selected SNPs
The parentage assignment procedure was tested using Sequenom technology for genotyping. For economic reasons only four plexes were developed. For technologic reasons, these four plexes included 192 SNPs among the 249 previously selected SNPs and were used to genotype 2 15 53   185   420   776   1135   1539  1692  1762  1690  1570  1411  1327  1207  1163  1065  1064  999  1040  969  994  995  991  1057  1110  1094  1184  1317  1357  1509   0   200   400   600   800   1000   1200   1400   1600   1800   2000 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6    Before performing sire assignment, we assessed the genetic link between the lambs and their declared dams (Table 3). Based on the likelihood and on the number of incompatibilities, 12 lambs were removed from subsequent analyses. The other 269 lambs were included in the sire assignment test. For the 174 lambs displaying a genotyped and validated dam, the sire assignment rate reached 97%. For the remaining 95 lambs with no genotyped dam, the sire assignment rate was 93.7%. If all lambs were considered together, whatever the genotyping status of their dam, the assignment rate reached 96%. For 233 lambs, the assigned sire matched the declared sire (90% agreement). For 25 lambs, the assigned sire did not match the declared sire. Among the 269 lambs, 258 were finally sire-assigned. For these 258 assigned lambs, the average likelihood ratio was 14.4 ± 3.6, ranging from 5.8 to 26.04, and there were on average 3.1 Mendelian incompatibilities (with a maximum of 8) between the sire and the lamb.

Number of SNPs
We then tested the robustness of our SNP panel. We proposed to assign 82 lambs to a list of 101 sires from which their four true sires (assigned in the previous step) had been removed. As a consequence, we expected that no paternal relationship would be found. Among these 82 lambs, 64 had a confirmed dam and the dam was unknown for the 18 remaining lambs. The list of candidate sires still included relatives of the 4 discarded sires (i.e. their sire and/or half-sibs and/or progeny). All the 64 lambs with a confirmed dam were not assigned. However, six out of the 18 lambs with unknown dam were assigned to a sire. These falsely assigned sires were paternal halfsibs of the six lambs.

MAFs for French parentage SNPs in worldwide breeds
The 249 SNPs selected for French breeds were analyzed in the set of worldwide breeds described in [19]. The populations from South West Europe showed the highest mean MAF (0.38), particularly with Spanish breeds which displayed a MAF around 0.4. On the opposite, African populations showed the lowest mean MAF (0.28) (Fig. 6).

MAFs for international parentage SNP panels in French breeds
Two parentage SNP panels (from New Zealand and Northern America) selected on different international breeds were analyzed in the 30 French populations of this study. Among the 1432 SNPs that fulfilled all our criteria (before genomic distribution selection), 6 and 25 SNPs were also included in the New Zealander and North American panels respectively. Finally, two and eight SNPs from the New Zealander and North American panels respectively were also included in the French panel. The MAFs estimated for these two international panels in the thirty French populations were 0.33 for the New Zealander panel and 0.35 for the North American panel. These values are to be compared with the average MAF of 0.39 for the French panel. As shown in Table 2, the mean MAF observed per population is highest for the French panel, and mean MAFs are higher for the North American panel compared with the New Zealander panel. If we consider PI, PE1, PE2 values, the North American panel displays a higher PE and a lower PI than the New Zealander panel, but neither of them reach the results obtained with the French panel (Table 2). When we randomly sampled 96 or 150 SNPs to use densities close to those of the New Zealander and North American panels respectively, the results with the randomly sampled French sub-panels were better for all the 30 populations than with the two international panels. Indeed, PI was lower with 96 SNPs than with the New Zealander panel and both exclusion probabilities were higher with 96 SNPs. By randomly selecting 150 SNPs, we obtained lower PI and higher exclusion probabilities for all the 30 populations than with the panel from North America which contains 163 SNPs (Additional file 2: Table S2).

Design of the SNP parentage panel
In this study, we report the development of a SNP panel dedicated to parentage assignment which is suitable for most of the French sheep breeds. Until now, parentage verification methods proposed to French sheep breeders have relied on the use of microsatellite markers. Microsatellite panels used for parentage verification have been tested for parentage assignment but were shown to lack in power for such analysis [23]. Whereas parentage verification can be used to validate or not a sire, parentage assignment enables breeders to identify the true sire amongst a list of candidate sires. Due to recent advances in molecular technology, SNPs are now of particular interest because their analysis can be entirely automated and they are gradually becoming the most used markers for parentage analysis [1]. The development of a dedicated panel for parentage assignment would provide breeders with the opportunity of making mating management easier while improving the known paternity rate. We opted for SNPs as molecular markers in order to develop a panel that can be genotyped automatically and at a more reasonable cost than microsatellites. The number of SNPs needed to assign individuals to their parents has been recently estimated in cattle and sheep: using the exclusion probability, Strucken et al. [31] recommended to use at  [6,13,15], but allowed us to develop an assignment panel suitable for all French sheep breeds with established breeding programs. In cattle, it has been shown that the full ISAG (200 SNPs) parentage panel is efficient in a wide variety of breeds, but when the number of SNPs must be decreased (for technical and/or economic reasons), population-specific panels are more efficient [30]. Based on simulations, Boichard et al. advised to use at least 175 SNPs if the targeted populations display "unfavorable" conditions such as non-genotyped dams, a partly genotyped set of sires and/or highly-related candidate parents [5]. This is slightly higher than the range of 100-150 SNPs proposed by Hill et al. if the list of potential parents includes highly-related individuals (such as full sibs, siresoffspring) [14]. In French sheep populations we are close to such "unfavorable conditions" because many sets of candidate sires include related males (for example sonssires, half-full sibs). This was for example the case when testing the BMC population for which the list of candidate sires included some parent-offspring pairs. Due to these particular population structures, we decided to select 200-300 SNPs in order to meet the recommendations provided by Boichard et al. [5]. After filtering, we identified 1432 candidate SNPs covering all the genome. We finally selected 249 evenly-spaced SNPs (one SNP per 10 Mb window). There was no redundancy among these 249 SNPs as indicated by the low level of linkage disequilibrium. It should be noted that the 249 assignment SNPs were selected without preconceived ideas on which genotyping technology will finally be routinely used. We decided to retain all 249 SNPs knowing that, depending on the genotyping tool, the number of SNPs actually used could decrease. As mentioned before, a minimum number of at least 175 SNPs had to be genotyped in order to have high rates of sire assignment. Our on-farm validation of the panel indicates that the panel should include at least 180 SNPs given the number of false-positive results when the dam is not genotyped and the true sire is not among the candidate sires.
Concerning the maximum number of SNPs, the main criteria was the cost of the genotyping. From an economical point of view we could not afford to select more than 300 SNPs because the final panel will be used by breeders so the cost of its use must be as low as possible. Raoul et al. estimated that using parentage assignment to increase the pedigree information could be profitable for a cost per assignment close to 6-7 € [24].
To select assignment SNPs, we mainly focused on MAF analysis, i.e. the standard procedure for developing such parentage panels [6,13], given that the number of SNPs needed for assignment decreases when the MAF increases [3]. In our study, a first step based on this criterion led to  [32]. In our study, we also calculated the exclusion and identity probabilities obtained for each French population for the panel of 249 SNPs. These probabilities are highly dependent on the number of SNPs [7,17]. In order to compare our panel to the panels from North America and New Zealand, we randomly sampled 96, 150 and 200 SNPs from the 249 selected SNPs. With less SNPs than the panel dedicated to North American breeds, we obtained better results for the French breeds as regards to PE and PI. Our results confirm that a greater number of SNPs results in a decrease of PI and an increase of PE, but also reveal that when the number of SNPs must be decreased (for technical or economic reasons), PE and PI levels can be maintained by specifically selecting SNPs adapted to the populations to which the panel is designed.
To assign lambs to their sire, we used the likelihood methodology which accounts for genotyping errors and allow missing genotypes. Other methodologies exist which usually only rely on exclusion [18]. However, likelihood approaches achieve better results than exclusion approaches as illustrated by Boichard et al. particularly when there are genotyping errors [5]. To perform assignment, we removed all lambs with incompatible dams based on genotype information because we could not rule out the possibility that the blood sample had been mislabeled. We used 3 criteria to assign a lamb to a sire: the likelihood ratio, the posterior probability and the number of Mendelian incompatibilities between the lamb and the sire. The likelihood ratio was the first criteria, but it is not sufficient alone as a lamb could have a positive likelihood ratio with two or three sires. We added the posterior probability as a second criteria to retain the most likely sire, and it could return only one sire per lamb because of the threshold (0.99) we applied. For technical reasons, we allowed a fairly high maximum of Mendelian errors (10) with on average three Mendelian incompatibilities between a lamb and its assigned sire. With improved genotyping quality, a more stringent threshold could be applied (i.e. 5 incompatibilities).  (Table 2). Similarly, better exclusion and identity probabilities are achieved with the SNPs of the French panel, even when the number of SNPs is decreased to reach a density close to that of the two international panels.
The French parentage panel was tested on-farm with individuals from the BMC breed. We obtained very encouraging results with 94% of individuals being assigned when the dam was not genotyped and up to 97% when the dams were also genotyped. However, with this onfarm design, we cannot be absolutely sure that all the candidate sires were sampled and genotyped, so it is likely that the true sires of some of the unassigned lambs were not in the list of candidate sires. By way of comparison, in commercial flocks where all the lambs and sires were sampled, on average 93% of lambs were sireassigned with the New Zealander panel in a situation where only sire genotypes were considered [6], which is similar to the performance of our assignment procedure (SNPs and algorithm).
Even if our panel performed well for the BMC breed, significant emphasis should be put on the need for minute preparation of the list of candidate sires. We show in this study that if the true sire is not in the list but that some of its relatives are, false-positive assignments can be observed when dams are not genotyped.
At the European level, no SNP parentage panel has been published before the panel we propose here. Based on the MAF criteria, we believe our panel should perform well in breeds belonging to the following SheepHapMap subgroups: South West Europe (excepted Mac Arthur Merino population which is inbred), Italy, and to a less extent Central Europe, part of the Northern Europe sub-group, and America and South West Asia. For example, if we focus on the results obtained with the Spanish breeds, observed MAF are of the same order as most of the French breeds (approximately 0.4) (Fig. 6).

Conclusion
In this study, we designed a SNP panel that will enable accurate parentage assignment in most of the French sheep breeds. This panel was established by genotyping approximately 30 individuals from 27 and 3 populations genotyped respectively with the 600 K and 54 K SNP chips.
The selected 249 SNPs were successfully tested for parentage assignment in the BMC breed with a minimum assignment rate of 94%. Even if very encouraging results were obtained in terms of paternity assignment rates, this study highlights a major condition to be met for the successful use of this new tool: when dams cannot be genotyped, the list of putative sires must be as complete as possible in order to prevent the risk of miss-assignment to a relative of the true sire.
This panel is currently being used in some French breeds. With an increasing number of assigned animals we will be able to assess on real datasets the benefits in terms of genetic evaluations, such as an improved accuracy of breeding values and connections between flocks, and in terms of pedigree-based genetic variability indicators.

Additional files
Additional file 1: Table S1. List of the 249 SNPs selected for Parentage assignment in French breeds. For each SNP, the genomic position is given in reference to the oar_v3.1 assembly, and the MAF is the average over the 30 French breeds included in the analysis. (XLSX 23 kb) Additional file 2: Table S2. Results (MAF, PE1, PE2 and PI) obtained after random sampling of SNPs from the 249 SNPs selected for parentage assignment in French sheep breeds. MAF, PE1, PE2 and PI statistics were obtained after 1000 random sampling of 96, 150 and 200 SNPs. PE1 and PE2 give the probability of exclusion of one or both parents respectively, and PI gives the probability of identity. (XLSM 33 kb) Ethics approval and consent to participate Blood samples were obtained from commercial farms animals. We used part of the blood that is routinely sampled as part of the National Selection Program for Resistance to Scrapie, so no additional sampling was needed. All the breeding organizations agreed to provide these samples.

Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.