A multilocus assay reveals high nucleotide diversity and limited differentiation among Scandinavian willow grouse (Lagopus lagopus)

Background There is so far very little data on autosomal nucleotide diversity in birds, except for data from the domesticated chicken and some passerines species. Estimates of nucleotide diversity reported so far in birds have been high (~10-3) and a likely explanation for this is the generally higher effective population sizes compared to mammals. In this study, the level of nucleotide diversity has been examined in the willow grouse, a non-domesticated bird species from the order Galliformes, which also holds the chicken. The willow grouse (Lagopus lagopus) has an almost circumpolar distribution but is absent from Greenland and the north Atlantic islands. It primarily inhabits tundra, forest edge habitats and sub-alpine vegetation. Willow grouse are hunted throughout its range, and regionally it is a game bird of great cultural and economical importance. Results We sequenced 18 autosomal protein coding loci from approximately 15–18 individuals per population. We found a total of 127 SNP's, which corresponds to 1 SNP every 51 bp. 26 SNP's were amino acid replacement substitutions. Total nucleotide diversity (πt) was between 1.30 × 10-4 and 7.66 × 10-3 (average πt = 2.72 × 10-3 ± 2.06 × 10-3) and silent nucleotide diversity varied between 4.20 × 10-4and 2.76 × 10-2 (average πS = 9.22 × 10-3 ± 7.43 × 10-4). The synonymous diversity is approximately 20 times higher than in humans and two times higher than in chicken. Non-synonymous diversity was on average 18 times lower than the synonymous diversity and varied between 0 and 4.90 × 10-3 (average πa = 5.08 × 10-4 ± 7.43 × 103), which suggest that purifying selection is strong in these genes. FST values based on synonymous SNP's varied between -5.60 × 10-4 and 0.20 among loci and revealed low levels of differentiation among the four localities, with an overall value of FST = 0.03 (95% CI: 0.006 – 0.057) over 60 unlinked loci. Non-synonymous SNP's gave similar results. Low levels of linkage disequilibrium were observed within genes, with an average r2 = 0.084 ± 0.110, which is expected for a large outbred population with no population differentiation. The mean per site per generation recombination parameter (ρ) was comparably high (0.028 ± 0.018), indicating high recombination rates in these genes. Conclusion We found unusually high levels of nucleotide diversity in the Scandinavian willow grouse as well as very little population structure among localities with up to 1647 km distance. There are also low levels of linkage disequilibrium within the genes and the population recombination rate is high, which is indicative of an old panmictic population, where recombination has had time to break up any haplotype blocks. The non-synonymous nucleotide diversity is low compared with the silent, which is in agreement with effective purifying selection, possibly due to the large effective population size.


Background
Until now, knowledge of the amount of nucleotide diversity in birds has been restricted to the domesticated chicken [1], although a few passerine species have been the subject of nucleotide diversity studies, e.g. red-winged blackbird [2], collared and pied flycatchers, blue tits and great reed warblers [3,4]. Estimates of nucleotide diversity reported so far in birds have been high (~10 -3 ), approximately one order of magnitude higher than in for instance humans [5]. A likely explanation for this is the generally higher effective population sizes compared to mammals. This reasoning is based on that silent nucleotide diversity, π S = 4 N eμ , is directly dependent on effective population size (N e ) and the mutation rate per generation (μ), and that μ is expected to be about the same in birds and mammals [6]. An alternative explanation for variation in nucleotide diversity could be differences in the occurrence of selective sweeps [7]. The latter suggests a relationship between the rate of recombination and the level of genetic diversity, as has indeed been shown for example in Drosophila [8] and humans [9,10]. This is particularly obvious in genomic regions with limited or no recombination, e.g. the female specific and non-recombining W chromosome in birds, that have unusually low levels of nucleotide diversity [11] as well as mammalian Y chromosomes that also harbour low levels of nucleotide diversity [5,12,13]. In order to understand variation in nucleotide diversity within genomes and between different lineages, accurate assessment of variation in recombination rates is required.
Single nucleotide polymorphisms (SNP's) are increasingly utilised as genetic markers for inferences of population genetic processes and molecular studies in ecology and evolution [12][13][14][15]. The use of SNP's has several advantages compared to more traditional markers such as microsatellites that commonly suffer from severe problems with homoplasy, null alleles and variable mutation patterns. Furthermore, the strong tradition of using mitochondrial DNA (mtDNA) is also seriously limited by the fact that it is a single, maternally inherited locus, and particularly in birds, likely subject to various forms of selection [14]. Like microsatellites, mtDNA has also a very high incidence of homoplasy when distances among taxa increase [15]. Although SNP's contain less information than both microsatellites and mtDNA, they are abundant in most species and evolve in a manner well described by simple mutation models (infinite sites model). A useful way of discovering SNP's in non-model organisms is by designing primers in exons in a species with considerable genomic data to amplify the corresponding exon (or intron) in a species with limited genetic data (i.e. the EPIC approach, [20,21]). This method has for example been used to amplify 242 introns in several diverse avian spe-cies by designing primers in conserved regions between the chicken genome sequence and orthologous zebra finch sequences [3]. In this study, the level of nucleotide diversity has been examined in the willow grouse, a non-domesticated bird species from the order Galliformes, which also holds the chicken. We collected samples from four locations in Scandinavia and used SNP's in 18 different autosomal exons located on different chromosomes in the genome. In addition to genetic diversity analyses, we estimated local levels of linkage disequilibrium (LD) and ρ, the population recombination rate. We also looked for population differentiation among the four sampling locations.
The willow grouse (Lagopus lagopus) has an almost circumpolar distribution but is absent from Greenland and the north Atlantic islands. It primarily inhabits tundra, forest edge habitats and sub-alpine vegetation. In Scandinavia, where the distribution of willow grouse overlaps with the closely related rock ptarmigan (Lagopus mutus), they generally occur at lower elevations and in wetter habitats with denser vegetation than the rock ptarmigan [16]. They breed in early summer and can produce up to two clutches per year, with on average eight eggs per clutch [17]. Willow grouse are hunted throughout its range, and regionally it is a game bird of great cultural and economical importance. Sequences were obtained for 18 loci in 64 individuals; 15  from Tjuoltadalen, 18 from Tjalling, 15 from Hardangervidda and 16 from Nesseby ( Figure 1). A total of 6571 bp (excluding gaps) were aligned over the 18 genes, resulting in a total of approximately 385 kb of sequence information across individuals.

Nucleotide variation in willow grouse
We identified a total of 127 segregating sites, of which 32 were singletons and 95 parsimony informative. This corresponds to 1 SNP every 51 bp. Clearly, the frequency spectrum is skewed towards rare frequency variants (Figure 2). One parsimony informative site was found with three variants (EPN2) and one with all four possible variants (MICRO). 26 SNP's were amino acid replacement substitutions. Statistics of sequence variation are summarized in Table 1 and 2. Total nucleotide diversity (π t ) was between 1.30 × 10 -4 and 7.66 × 10 -3 (average π t = 2.72 × 10 -3 ± 2.06 × 10 -3 ) and silent nucleotide diversity varied between 4.20 × 10 -4 and 2.76 × 10 -2 (average π S = 9.22 × 10 -3 ± 7.43 × 10 -4 ). Non-synonymous diversity was on average 18 times lower than the synonymous diversity and varied between 0 and 4.90 × 10 -3 (average π a = 5.08 × 10 -4 ± 7.43 × 10 -3 ).  No gene had a statistically significant Tajima's D value [18], but the average value over all genes was negative, (-0.52 ± 0.85), indicating that an excess of low frequency variants are segregating in the population ( Figure 2). Similarly, Fay and Wu's H tests [19] gave no statistically significant results for any locus, but the average H across loci was negative (-1.34 ± 2.12) ( Table 3). The average synonymous sequence divergence (KS) between chicken and willow grouse was 0.13 ± 0.05 and the average non-synonymous sequence divergence (KA) was 0.015 ± 0.023 (Table 3). There is a particularly high variation in KA among the genes, which either suggests that selective pressure varies between different domains of the protein and/ or among these genes.

Linkage Disequilibrium
Generally, low levels of linkage disequilibrium were observed within genes, with an average r 2 = 0.087 ± 0.12. D' within genes was on average < 1. For two genes, (BRIP1 and TRANS), r 2 was higher than 0.30 (D' < 1). The per site per generation recombination parameter (ρ) was estimated for the 16 genes with more than two segregating sites (BCL-X, EPN2 and TAR had less than two and were not analysed). Of the 15 loci, only four produced 95% CI's within ρ = 0 -100. Maximum likelihood estimates of ρ per site was 0.045 in BRIP1, 0.043 in CXCR4, 0.015 in PKP4 and 0.010 in TRANS, with a mean of 0.028 ± 0.018.

Population Structure
F ST values based on synonymous SNP's varied between -5.60 × 10 -4 and 0.20 among loci and revealed low levels of differentiation among the four populations, with an overall value of F ST = 0.03 (95% CI: 0.006 -0.057) over 60 unlinked loci. Pairwise F ST 's varied between 0.005 to 0.045, and three out of six comparisons were statistically significant (Table 4). F ST values based on non-synonymous SNP's varied between -0.005 and 0.233 over 12 loci, with an overall F ST = 0.049 (95% CI: -0.019 -0.121). None of these 60 unlinked synonymous and nonsynonymous SNP's show a significant excess of heterozygote's.  The program STRUCTURE revealed the highest likelihood for K = 3 for both test statistics ((LnP(D) and ΔK).

Confirmation of SNP's and PHASE haplotypes by cloning
The cloned sequences were compared with the directly sequenced PCR products for each individual and marker. In total, 48 SNP's were compared between cloned and directly sequenced PCR products. In one case, a heterozygote had been interpreted as a homozygote in the directly sequenced product, which means that heterozygotes were correctly assigned based on direct sequencing of PCR products in 98% of the cases. In addition, we compared the haplotypes based on the PHASE assignments with the results from the cloning experiment. The clones containing PCR products from the three individuals for marker BCL-2, yielded six different haplotypes, the three individuals cloned for marker CAAX produced five different haplotypes, the three individuals cloned for marker KELCH also produced five different haplotypes, while the two individuals cloned for TRANS produced two different haplotypes. All SNP's except one were correctly assigned in haplotypes.

Discussion
Polymorphism levels in non-domesticated bird species have been very little investigated, apart from in a few passerine species. Particularly, extensive SNP surveys in wild close relatives to the avian model organism, the chicken are lacking. We used the chicken genome as template for primers and amplified the orthologous regions in the willow grouse. We amplified 18 exons in individuals from four populations in Scandinavia. Levels of nucleotide diversity in coding regions of willow grouse (π t = 2.72 × 10 -3 ) was substantially higher than in humans [5] and higher than in many other mammalian species [20][21][22]. The synonymous nucleotide diversity is particularly high (π S = 9.22 × 10 -3 ), especially compared to the non-synonymous diversity (π a = 5.08 × 10 -4 ). The low ratio between π a and π S (π a /π S = 0.055) indicate that that new mutations entering the populations are effectively selected against. In no bird species so far have such high silent polymorphism levels been reported. What population genetic processes most likely explain this pattern? Theory holds that silent nucleotide diversity is dependent on the mutation rate per generation and the effective population size. There are reports of very large numbers of breeding pairs of willow grouse in Norway and Sweden, and figures of more than SNP frequency spectrum combined across genes  , which is indicative of low frequency variants segregating in the populations. This could be an effect of population expansion since the retreat of the ice around 10.000 years ago, followed by re-colonisation [24].
There was a significant but weak population differentiation among the four locations (F ST = 0.03, 95% CI: 0.006 -0.057) and a positive but non-significant correlation between genetic and geographic distances. The program STRUCTURE estimated the most likely number of populations to three, however the biological meaning of this result is questionable as the likelihood for K = 1, K = 3 and K = 4 are very similar. The low levels of differentiation are most likely explained by the large number of birds in Scandinavia and a more or less continuous habitat. From an extensive study where 300 individuals were captured and equipped with radio-transmitters [16] female natal dispersal was estimated at 10.2 km, while males on average dispersed three times shorter distances (3.4 km). In Norway natal dispersal was estimated to be 4.2 km for both sexes combined [25]. These distances are longer than for the majority of forest and plain grouse [16]. Effective dispersal distance in hazel grouse in northern Sweden was estimated to 1.4 km using microsatellites [26] and significant local population structure was detected in black grouse males but not females [27]. Local structure has also been detected in Scottish red grouse (Lagopus lagopus scoticus) using both mtDNA and microsatellite loci [31,32]. However, previous studies of population structure among Scandinavian willow grouse populations using allozymes found low degrees of genetic differentiation [28][29][30].
One remarkable result of the current study is the extremely low levels of linkage disequilibrium found in the willow grouse (mean r 2 = 0.087 ± 0.12, mean D' < 1). It is possible that the pattern we observe is an effect of demographic factors rather than effects of recombination events. In a large population (stable or expanding), accumulation of SNPs can lead to low correlation (r 2 ) between neighbouring SNPs even in the absence of recombination. In such cases, D' as a measure of LD should be much more informative as it tends to be a more accurate reflection of the historical association between two loci. The fact that we within exons always have some pairwise D' < 1 suggest that recombination can have caused the low r 2 values that we observe. Obviously, in order to fully understand patterns of LD in the willow grouse genome we need many more markers located in different regions of the genome. Studies of LD in other avian species, particularly in natural populations are rare. LD in domesticated chicken have been estimated for two genomic regions in three different commercial breeds with different levels of inbreeding and large variations in LD was found both between breeds as well as between genomic regions [31].
In contrast to the low levels of LD between autosomal markers in willow grouse, extensive LD was found among closely located markers on the Z chromosome in the passerine species collared flycatchers [32]. A difference in LD between autosomes and the Z chromosome is interesting and implies that there might be selection for a reduction in recombination on Z. A recent study using collared and pied flycatchers as model organisms have shown that species recognition has a genetic component [33] and other studies have shown that disproportionately many genes that are involved in reproductive isolation are located on Z [49,50]. Taken together, these observations suggest that Z chromosomes in birds are enriched for speciation genes.
Linkage mapping in species belonging to the order Galliformes, suggest high recombination rates compared to mammals. Groenen et al. estimated the chicken linkage map length to 3800 cM [34] and although the physical size of the chicken genome is threefold smaller than most mammalian genomes [35], the genetic maps are of similar sizes. Similarly, both turkey and quail have long linkage maps compared to their genome sizes [36][37][38]. Wholegenome linkage mapping in birds outside the Galliformes order is in its infancy, although recently two linkage maps were published for species belonging to the order Passeriformes, the zebra finch [39], and the collared flycatcher [40]. Both maps were shorter than the chicken map, despite supposedly similar genome sizes.
Population based approaches for estimating recombination is an attractive alternative to linkage mapping, since it can be difficult to obtain well defined pedigrees. One such way of studying recombination is by investigating the per generation population recombination rate (ρ), which is a function of both crossing-over frequency and effective population size. This has been estimated in red winged blackbirds and ρ ranged between 0.045 to 0.225 [2]. In the willow grouse we estimated mean ρ to 0.028 ± 0.018 in four genes. This again suggests that birds exhibit much higher rates of recombination in general than for example humans [41] and seem more similar to Drosophila [42]. Again, a likely explanation that the per generation recombination rate is high in birds, is that they generally have large effective population sizes. This fits very well with the high levels of silent nucleotide diversity seen in birds.

Conclusion
Scandinavian willow grouse have an unusually high silent nucleotide diversity compared with other species. There is very little population structure among the four geographically distant locations. This suggests that the effective population size is high, but possibly also that the mutation rate is high. The non-synonymous nucleotide diversity is low compared with the silent, which is in agreement with effective purifying selection, possibly due to the large effective population size. The population shows signs of population expansion, which probably dates back to the last glaciations. There is very little linkage disequilibrium within the genes and the population recombination rate is high, which is indicative of an old panmictic population, where recombination has had time to break up any haplotype blocks.

Tissue samples and Sampling
Wings or breast muscle from 64 shot willow grouse were collected at two localities in Norway; Nesseby (70° 08' N; 29° 00' E) ( Table 4). These four locations span the north-south distribution of willow grouse in Scandinavia. DNA was extracted from small biopsies (a few mm 3 ) using a salt extraction protocol.

Primer design, PCR and Sequencing
Primers were designed based on annotated chicken protein coding genes in Genbank http:// www.ncbi.nlm.nih.gov/ (Table 5). In order to generate the largest possible number of base pairs per sequencing reaction, the major criterion for choosing a locus was that it had to have an exon length exceeding 500 bp. We also wanted to target as many different chromosomes as possible based on the chicken karyotype, as there are three different chromosomal classes (micro, intermediate and macro), each possessing distinct features such as gene content, recombination rate and GC content [35], which potentially could affect polymorphism levels. See Table 6 for marker information and primer sequences.  Table 6). PCR profiles consisted of 5 min denaturation at 95°C followed by 33-40 cycles of 35s denaturation at 95°C, 45 s annealing at 50-55°C and 1 min extension at 72°C with a final 5 min 72°C step (See also Table 6). To avoid contamination, DNA extractions, pre PCR and post PCR pipetting were carried out in different rooms and aerosol-resistant filter pipette tips were used throughout. The PCR products treated with ExoSAP-IT (USB Corporation) before sequencing reactions were ran with Dyenamic ET terminators (GE Healthcare, Piscataway, NJ). All products were sequenced on a MegaBace 1000 capillary instrument (GE Healthcare, Piscataway, NJ). Individuals from the two Norwegian populations were sequenced both ways. For all loci, except CXCR4, LEPR and MBL, the sequence quality was so high that the remaining samples were sequenced using the forward primer only. Sequences were edited in Sequencher v. 4.6 (Gene Codes).

Nucleotide Diversity and Divergence
Estimates of standard population genetics parameters and neutrality test statistics were calculated for each locus with DnaSP v. 4.10.9 [43]. Insertions and deletions are reported, but not included in any analyses. Pairwise sequence divergence between chicken and willow grouse was estimated as K A and K S using the Nei-Gojobori algorithm [44] with Jukes-Cantor correction [45] in MEGA 3.1 [46].

Haplotype Phasing
Haplotype phasing was performed using PHASE v. 2.1 [40,41] on all parsimony informative SNP's for each gene separately.

Linkage Disequilibrium and Population recombination rate
The level of linkage disequilibrium (LD) between parsimony informative sites within genes was estimated in Haploview [47], both as r 2 , the mean squared correlation in allelic state between pairs of SNP's and D', the difference between the observed and the expected gamete frequencies standardised by the theoretical maximum for the observed allele frequencies. These two estimates of LD are somewhat different and it has been argued that r 2 has more power to detect lack of recombination than D' [48]. D' = 1 means complete LD and occurs if at least one allele at each locus is completely associated with an allele at the other locus, however D' < 1 is somewhat difficult to interpret. Both estimates are considered to be allele frequency dependent, but r 2 , less so than D'. Phased data was used in these analyses. The population recombination parameter ρ = 4N e r was estimated using Ldhat 2.0 [49], where N e is the effective population size and r is the per site per generation recombination rate. Likelihoods were obtained for recombination rates in the range, ρ = 0 -100 and 95% CI was calculated.

Population Structure
Population differentiation was estimated using Weir & Cockerhams F ST statistic [50] in Genetix 4.03 [51]. Significance was assessed from 1000 permutations. Synonymous and non-synonymous SNP's were analyzed separately. All analyses were carried out on a subset of the data represented by only parsimony informative unlinked loci. We used a LOD score cutoff of 3 to infer linkage. LOD scores were estimated for each pair of SNP's in Haploview [47].
The genetic structure of Scandinavian willow grouse was also investigated with the model-based clustering algorithm implemented in STRUCTURE v. 2.1 [52,53]. We used the admixture model, with population information, five runs with a burn-in of 100,000 and a run length of 1,000,000 for a number of clusters from K = 1 to K = 10, allowing for correlation of allele frequencies between clusters. We performed five independent runs per K to ensure that the results were consistent. We investigated the most likely number of clusters in two different ways, by considering the log 'probability of data' lnP(D) for the different numbers of K, and by using the statistic ΔK (Evanno et al), which considers the rate of change in lnP(D) among successive K values. Hardy-Weinberg exact tests for heterozygote excess was performed in Genepop on the web using option 1. Geographical distances between populations were estimated in kilometers (km) using Google Earth.

Confirmation of SNP's and PHASE haplotypes by cloning
Cloning of PCR products from four markers, BCL-2, CAAX, KELCH and TRANS was performed using pGEM ® -T Easy Vector System II following the manufacturer's protocols (Promega). PCR products from three individuals were cloned for BCL-2, CAAX, KELCH and two for TRANS. For each marker, eight colonies were amplified with illustra TempliPhi Amplification Kit (GE Healthcare) and finally sequenced by Macrogen Inc. (Macrogen, Seoul, South Korea) on ABI 3730 instruments (Applied Biosystems).

Authors' contributions
SB designed the primers, analysed the data and wrote the manuscript. MQ carried out all lab work. JH designed and coordinated the study. All authors read and approved the final manuscript.