Fine-scale mapping of meiotic recombination in Asians
© Bleazard et al.; licensee BioMed Central Ltd. 2013
Received: 3 February 2012
Accepted: 22 February 2013
Published: 8 March 2013
Skip to main content
© Bleazard et al.; licensee BioMed Central Ltd. 2013
Received: 3 February 2012
Accepted: 22 February 2013
Published: 8 March 2013
Meiotic recombination causes a shuffling of homologous chromosomes as they are passed from parents to children. Finding the genomic locations where these crossovers occur is important for genetic association studies, understanding population genetic variation, and predicting disease-causing structural rearrangements. There have been several reports that recombination hotspot usage differs between human populations. But while fine-scale genetic maps exist for European and African populations, none have been constructed for Asians.
Here we present the first Asian genetic map with resolution high enough to reveal hotspot usage. We constructed this map by applying a hidden Markov model to genotype data for over 500,000 single nucleotide polymorphism markers from Korean and Mongolian pedigrees which include 980 meioses. We identified 32,922 crossovers with a precision rate of 99%, 97% sensitivity, and a median resolution of 105,949 bp. For direct comparison of genetic maps between ethnic groups, we also constructed a map for CEPH families using identical methods. We found high levels of concordance with known hotspots, with approximately 72% of recombination occurring in these regions. We investigated the hypothesized contribution of recombination problems to age-related aneuploidy. Our large sample size allowed us to detect a weak but significant negative effect of maternal age on recombination rate.
We have constructed the first fine-scale Asian genetic map. This fills an important gap in the understanding of recombination pattern variation and will be a valuable resource for future research in population genetics. Our map will improve the accuracy of linkage studies and inform the design of genome-wide association studies in the Asian population.
Homologous recombination during meiosis results from the resolution of programmed DNA double-strand breaks by crossover between non-sister chromatids. Recombination is concentrated into hotspots roughly 2 kb in length scattered throughout the genome . Recombination rates between markers are used to construct genetic maps. These are important for linkage studies and also genome-wide association studies, since tagging single nucleotide polymorphisms (SNPs) depend on linkage-disequilibrium structures. Early human genetic maps were constructed using sparse microsatellite markers in Icelandic and CEPH (Center d’Etude du Polymorphisme Humain) families [2–4]. More recent studies have used SNP markers to reveal the locations of recombination with much greater resolution in Hutterites and other European-American cohorts (Framingham Heart Cohort Study and Autism Genetic Resource Exchange) [5, 6] and African-Americans . These studies used algorithms based on the ancestral origin of alleles , and parent-sibling tracing approaches [5, 6, 8]. In contrast to family methods which can directly observe recombination, coalescent methods use population genetic data to estimate the genetic map as a parameter of a probabilistic model. This captures the recombination that shuffled genomes through the generations leading to the current population. Applying these methods to HapMap data achieved very high resolution and allowed the identification of hotspots using likelihood ratio tests [1, 9].
For all applications, it is important for a genetic map to correspond well to the group to which it is applied. For example, the use of a map which accurately reflects the average recombination rate of a population will increase the power of gene identification studies. Some evidence suggests, however, that recombination patterns differ between populations. It is known that the DNA-binding protein PRDM9 plays a key role in specifying recombination sites . The binding motif that PRDM9 recognises must change rapidly under a constant Red Queen dynamic because of the ‘hotspot paradox’ [11–13]. Studies of human populations of African and European descent have shown variation in alleles of PRDM9 and local patterns of recombination [6, 7]. Three minor alleles, in addition to the more common A and B haplotypes, were previously identified in Han Chinese coding for either 12 or 13 zinc fingers . Interestingly, our preliminary genetic data from the Mongolian population suggest a larger diversity of PRDM9 alleles, including novel SNPs predicted to affect DNA binding (Additional file 1). Given this, the genetic map of Asians may be different to those from other populations. Furthermore, the hotspots found by coalescent models receding many generations into the past may not be in use today.
In a previous work, we constructed a map for Mongolians using 1,039 microsatellite markers . However, the resolution of the map was not sufficient to identify hotspot usage. There are currently no other Asian genetic maps available. Here we present a new genetic map, constructed using dense SNP markers in Mongolian and Korean pedigrees. The resolution of this map is now sufficient to reveal fine-scale patterns of recombination and Asian hotspot characteristics for the first time.
Size of family (children)
Total number of families
Total number of children
Genetic map summary
Genetic length (cM)
Summary of cohort results
Number of SNP markers
Total crossovers detected
Average male recombination rate
Average female recombination rate
Median resolution (bp)
Historical hotspot concordance
Concordance of randomly placed intervals
Estimated true historical hotspot usage
Given that the effects of age on recombination are small, they may be susceptible to discrepancies in methodology. In the study by Kong et al. , ages were rounded up to the nearest 5 years and only 1000 microsatellite markers of unknown position were used. Interestingly, Kong et al. found a slight decrease in recombination rate specifically between the ages of 25 and 30, and a higher proportion of the mothers in the Asian cohort fall within this age range. To investigate this, we sampled Asian mothers with an age distribution corresponding to the previous study and binned individuals into 5-year windows. However, these changes only resulted in a reduced effect of -0.23 recombinations per year. Our findings therefore do not fit with either the positive effects reported previously, or the much stronger and chromosome-specific negative effects reported more recently. The frequency of aneuploidy increases powerfully and exponentially with maternal age, whereas our regression analysis found a poor coefficient of determination and weak effect. Our results therefore contradict the hypothesis of a very simple, direct relationship between reduced recombination and increased aneuploidy with maternal ageing, although there may be more subtle links between the two.
We have constructed the first high-resolution genetic map for Asians. Our map will be a useful resource for future population genetics research, improve the accuracy of linkage studies, help to predict disease-causing structural rearrangements and inform the design of genome-wide association studies in the Asian population. In general, Asian recombination patterns were similar to those in Europeans. A higher proportion of Asian and European recombination was mapped to hotspot regions than in previous studies. These results validate the application of combined genetic maps to Asians. Our data show that maternal age has a weak, negative effect on recombination rate, and there are no strong chromosome-specific effects.
At the core of our algorithm was a hidden Markov model which could find recombination events in family quartets consisting of a father, mother and two children. The input for this model was genotype data for SNPs on a single chromosome, encoded at each position as the alleles A and B. For any genomic locus in a parent, two possible modes of transmission are possible: both children may receive the same allele, or each child may receive a different allele. These paternal and maternal transmission states combine to yield four possible states. Where the transmission state differs at adjacent genomic loci, we can infer that a crossover occurred between them during a paternal or maternal meiosis. We model the sequence of states along the markers on a chromosome as a Markov process. Observed genotypes at a given locus in all four family members may be consistent with a subset of these states. For example, if the father, mother and child genotypes are AA, AB, AA and AB respectively at some SNP, then we can infer that the mother is transmitting different alleles to the two children, but can make no inference regarding paternal transmission. Homozygous loci are uninformative because they are consistent with all four states. A hidden Markov model was constructed with a state space consisting of the four transmission states, and transition weights between these states corresponding roughly to previously reported genome-wide recombination rates . A hidden Markov model is an extension of a simple Markov chain, with observations also modelled with a probability distribution dependent on the state. This allows, for example, genotyping errors to be included in the model, rather than requiring post-hoc filtering. The set of emissions from each state was designed to encode all possible genotypes in all family members at a given locus, assuming a maximum of two alleles. The weights of emissions consistent with a transmission state were set to sum to 0.9995. Inconsistent emissions were allocated to the small remaining probability to allow for errors in DNA replication or genotyping. These small emission probabilities are balanced with the state transition probabilities to ensure that a few genotyping errors will not result in an erroneous double state transition. Family genotype data was fed into this model, and the Viterbi algorithm applied to find the most likely sequence of states. At each state change, an algorithm scanned forward and back to find the region of uncertainty, or ‘prediction interval’, where either neighbouring state was consistent with observations. For example, the Viterbi algorithm is able to determine a state change from both children inheriting the same allele from both parents, to the children inheriting different paternal alleles. However, the data will normally be ambiguous about the exact location of this change-point. To solve this problem, the set of possible states at each SNP in the region of the state change is recorded and a maximal interval formed by starting at the state change SNP and continuing in the 5′ and 3′ directions as far as more than one state remains possible. This non-probabilistic, strict delimitation of boundaries protects the algorithm from sensitivity to changes in state transition or emission probabilities. The SNPs marking the prediction interval were mapped to NCBI build 36 physical position. Alternative hidden Markov models were used to find maternal recombinations on the X chromosome (see Additional file 1). For families with an odd number of children, two overlapping quartets formed a quintet. Where a recombination prediction interval overlapped in the two quartets, we assume that the recombination occurred in the child present in both quartets and so only count it once. The collection of all prediction intervals was returned for further analysis.
Our work uses a novel algorithm, although similar approaches have been applied to phase next-generation sequencing data [22, 23]. We validated our algorithm by running it on simulated family data for which crossover locations were known. To simulate data as realistically as possible, we based our families on real haplotypes for chromosome 15 from the phased JPT+CHB HapMap dataset . Parents were constructed by selecting two haplotypes each at random out of the 340 available. We then simulated four meioses, selecting 3 crossover locations at random in each, to generate two children. We constructed 1,000 family quartets in this way, containing 12,000 recombination events. Running our hidden Markov model on this data yielded 11,643 prediction intervals that correctly localised 97% of the synthetic crossovers. Out of these 11,643 prediction intervals, only one was incorrect. This means that the precision rate of our algorithm is greater than 99.99%. To check the robustness of our approach on real data, we compared our results to crossovers inferred from sparse microsatellite markers. We tested results in 6 family quartets on chromosome 19. This chromosome had 32 microsatellite markers typed in those family quartets, as well as in their grandparents, allowing discovery of crossovers using identity-by-descent. Out of 33 crossovers detected by our algorithm, 32 were consistent with the microsatellite data, and one corresponded to a 6 Mb distant crossover.
Where families contained more than two children, the children were split into mutually overlapping groups of three. Each such triple was then split into three overlapping pairs, which were input in turn together with parent genotype data into the hidden Markov model. The recombination rates in each child were derived from the rates in the overlapping quartets containing that child. This allowed consistency checks to identify results which contradicted with those of overlapping quartets. Before constructing our genetic map, we checked for such inconsistent triples, and filtered out all of the meioses implicated in these errors. There were four (partially overlapping) triples in three families where checks found mathematical inconsistency. We were able to rescue data for three children in one such family where they formed a consistent trio. This left 9 children in three families where recombinations could not be distinguished consistently. These were removed from the analysis.
This study used genotype data from Mongolian pedigrees, generated by the Illumina Human 610-Quad Beadchip yielding 569,132 markers. The data was originally collected for association studies in the GENDISCAN project. This was approved by the Institutional Review Boards of Seoul National University, approval number H-0307-105-002. Mendelian errors were filtered using PedCheck and double recombination errors were filtered using Merlin. Markers with a low call rate (<99%), high error rate (>1%) or low minor allele frequency (<0.01) were excluded as described previously . We also used genotype data from the Healthy Twin Study, a project that genotyped adult twins and other family members willing to participate through two Korean hospitals. The Affymetrix Genome-Wide Human SNP Array 6.0 was used to type 906,600 SNPs overall, which was reduced to 516,452 after cleaning Mendelian and non-Mendelian errors. The criteria for exclusion were: Mendelian error in more than 3 families, minor allele frequency below 0.01, Hardy-Weinberg equilibrium test <0.001, missing genotype data >0.05 or non-Mendelian error in more than 3 families. The study protocol was approved by the ethics committees at Samsung Medical Center and Busan Paik Hospital. Results were compared to those obtained by applying the same method to CEPH pedigree data. This data was generated under the study entitled ‘Genotyping NIGMS CEPH Samples from the United States, Venezuela and France’, and provided to us under a dbGaP access request. There were 931,633 SNP markers in this dataset. From the complete pedigrees, families of at least four members were selected using a Python script. In all, 182 families were gathered in this way. The set of 32,991 (34,136 including the X chromosome) hotspots from HapMap Phase II  was used for historical hotspot concordance analysis after lifting to NCBI build 36. The software utility liftOver from UCSC (http://genome.ucsc.edu/cgi-bin/hgLiftOver) was used to perform this transfer, during which 6 hotspots were partially deleted.
We thank the subjects who provided their DNA and family information in all three cohorts. We thank the researchers who performed the study ‘Genotyping NIGMS CEPH Samples from the United States, Venezuela and France’, and generously made their data available. Genotyping data for the CEPH Family Panels was obtained from the NIGMS Human Genetic Cell Repository (http://ccr.coriell.org/Sections/Collections/NIGMS/?SsId=8) through dbGaP accession number phs000268.v1.p1. We thank researchers in the GENDISCAN project and the Healthy Twin Study whose efforts provided us with valuable, high-quality genotype data with which to work. The GENDISCAN project was supported by the Korean Ministry of Education, Science and Technology (Grant Number 2003-2001558).
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.