A general linear model-based approach for inferring selection to climate
© Raj et al.; licensee BioMed Central Ltd. 2013
Received: 27 February 2013
Accepted: 11 September 2013
Published: 22 September 2013
Skip to main content
© Raj et al.; licensee BioMed Central Ltd. 2013
Received: 27 February 2013
Accepted: 11 September 2013
Published: 22 September 2013
Many efforts have been made to detect signatures of positive selection in the human genome, especially those associated with expansion from Africa and subsequent colonization of all other continents. However, most approaches have not directly probed the relationship between the environment and patterns of variation among humans. We have designed a method to identify regions of the genome under selection based on Mantel tests conducted within a general linear model framework, which we call MAntel-GLM to Infer Clinal Selection (MAGICS). MAGICS explicitly incorporates population-specific and genome-wide patterns of background variation as well as information from environmental values to provide an improved picture of selection and its underlying causes in human populations.
Our results significantly overlap with those obtained by other published methodologies, but MAGICS has several advantages. These include improvements that: limit false positives by reducing the number of independent tests conducted and by correcting for geographic distance, which we found to be a major contributor to selection signals; yield absolute rather than relative estimates of significance; identify specific geographic regions linked most strongly to particular signals of selection; and detect recent balancing as well as directional selection.
We find evidence of selection associated with climate (P < 10-5) in 354 genes, and among these observe a highly significant enrichment for directional positive selection. Two of our strongest 'hits’, however, ADRA2A and ADRA2C, implicated in vasoconstriction in response to cold and pain stimuli, show evidence of balancing selection. Our results clearly demonstrate evidence of climate-related signals of directional and balancing selection.
Within the last 100,000 years humans dispersed from Africa to occupy most of the habitable space in the world. During this process our species has successfully combined cultural buffering, biological plasticity and adaptation to cope with the wide range of new ecosystems, pathogens and climates they encountered [1–3]. Climate, in particular, comprises many diverse elements such as temperature, humidity, precipitation and solar radiation, so it would be surprising if many different genes had not been influenced by natural selection. Indeed, many physiological traits exhibit geographic trends that correlate with climate [4–8]. However, without an explicit link to global patterns of genetic variation, the extent to which these trends reflect adaptation through natural selection remains unclear.
Many genetic studies on humans have attempted to identify genes and genomic regions associated with regional adaptation by looking for signatures of selection [2, 9–15]. These studies have relied on a diverse range of approaches that mostly identify outliers in the empirical genome-wide data, including searches for markers exhibiting unusually high levels of geographic differentiation [2, 9], for genomic regions with high linkage disequilibrium and derived allele frequency , and for markers where the loss of genetic variability that occurred when humans migrated out of Africa has been particularly high or low [11–14]. These approaches suggest that a substantial proportion of the human genome contains candidates of positive selection . However, it can be difficult to ascribe environmental or biological factors to any particular signal. Furthermore, wherever signatures of selection are sought by considering patterns of genetic variation in isolation, i.e. without reference to a specific hypothesis, it can become difficult to separate genuine signals from those that arise from other sources including genotyping errors and other artifacts.
One way to increase statistical power when searching for signatures of selection is to study patterns of genomic variation across populations in relation to particular environmental characteristics. For example, physiological adaptations to temperature and solar radiation, as well as several other traits, have been shown to vary along a latitudinal cline [16–18], suggesting selection by climate. Even modest regional allele frequency differences can provide evidence of selection if they correlate strongly with one or more environmental variables, provided the environmental variables are accurately measured and also approximate the selective pressure over the time of evolution. Explored earlier by Prugnolle et al. (2005) , this approach has been pioneered by Hancock et al. [20–23], who use a Bayesian algorithm  to search for markers at which variations in allele frequency correlate more than the genomic average with global variation in one or more climatic variables. In this approach, absolute significance is not determined. Instead, markers are ranked in terms of their degree of association. On the one hand this makes the approach sensibly conservative, but on the other it precludes a meaningful estimate of the proportion of the genome actually influenced by selection.
Here we present a new approach for detecting signatures of selection based on the use of general linear models to analyze similarity matrices. This framework allows three important advantages. First, data from neighboring markers can be combined into a single genetic window, thereby reducing greatly the number of independent tests that need to be performed. Second, the method is flexible, allowing incorporation of possible cofactors such as geographic distance between populations and interactions between variables. In particular, by fitting genome-wide genetic relatedness we can control for variation in the level of shared ancestry between different pairs of individuals or populations. Third, statistical significance is determined through a form of Mantel test, based on repeated randomization (scrambling) of the data at one predictor variable, allowing absolute estimates of significance rather than empirical ranking. We apply this approach to genome-wide data from 45 global populations using four climate variables, each quantified in both the summer and the winter seasons. We then compare our results with those of Hancock et al. (2011)  available through the dbCLINE online database (http://genapps2.uchicago.edu:8081/dbcline/main.jsp), and identify a number of overlapping genomic regions as candidates for recent selection.
The method we propose is designed to identify regions of the genome that have experienced climate-related selection since modern humans colonized the world 'out of Africa’. We aim simultaneously to reduce the impact of false positives, by radically reducing the necessary number of independent tests, and to minimize the impact of observations that are unusual for reasons other than natural selection, for example genotyping errors. MAGICS is based on the identification of genomic regions where genetic similarity between populations correlates with climatic similarity, after correcting for factors such as genome-wide relatedness and geographic distance.
Previous analyses of genome-wide data and climate have tended to use a 'linear’ framework in the sense that there is a 1:1 correspondence between a given marker and the trait being measured. For example, one might conduct a regression of solar radiation index against allele frequency. With MAGICS, all variables are transformed into similarity or distance matrices. Thus, solar radiation index is scored as pairwise differences in solar radiation index between geographic regions, while genotype data are scored variously as genotype identity (one locus, same or different), relatedness (between individuals over multiple loci) or genetic distance (between populations, multiple loci). The extent to which two or more similarity matrices are correlated is classically determined using a Mantel test, in which the raw correlation between linearized matrices is tested by repeated randomization. We extend this slightly by fitting general linear models instead of simple correlations or multiple regressions. This allows for the inclusion of factors as well as continuous variables and, where desired, for inclusion of interaction terms between variables. As in a classical Mantel test, significance is determined by randomization of one predictor variable.
where the response variable, LocalFST, is the genetic relatedness between populations/individuals at a given genomic location, and the predictor variables are: GWFST or genome wide FST, defined as the genetic distance between pairs of populations based on all available SNPs across the genome; Geography, defined as the land-only distance between population pairs ; and Climate, the difference between pairs of populations at a climate measurement of interest. In this way we ask the extent to which a given genomic region differs more or less than expected among populations, relative to variation in the entire rest of the genome, and at the same time determine whether this measure of difference covaries systematically with the climatic difference between the regions. Geographic distance is included as a conservative factor to control for deviations from a simple isolation by distance model. This might include instances where genetically dissimilar populations live in close proximity or where migration has caused genetically very similar populations to be physically distant. Our reasoning is that selective forces due to disease, for example, may differ greatly between regions with similar climate on different continents. Local FST was calculated across 'genic windows’ comprising all SNPs located ± 25Kb from the midpoint of each gene; this 50Kb window size encompasses the full transcript of roughly 65% of genes in the human genome . Using 'genic windows’ enabled us to focus on regions of the genome annotated by function, compare our results to those from similar studies and account for variability in gene size. Alternative approaches based on fixed or sliding windows could also be used but were not pursued in the current study. No attempt was made to weight SNPs by their function in coding, promoter or enhancer regions, nor to exclude SNPs lying outside shorter genes, although both of these are possible improvements that would be worth exploring in future work.
Count of hits for each climate variable
a. Summary of hits
CS = 0
CS = 1
CS = 0 & CS = 1
b. CS = 0 and CS = 1 overlaps
CS = 1
CS = 0
The difference between using CS = 0 and CS = 1 is less than one would expect by chance. If selection occurs only on one continent, worldwide scrambling (CS = 0) will produce a far lower P-value than scrambling within continents. In contrast, if selection acts globally on the same trait and in the same way, the difference between the two scrambling regimes should be much less. Summer relative humidity yielded 55 hits with CS = 0 but only 1 with CS = 1, suggestive of localized selection, whereas winter solar radiation and winter temperature yield similar numbers of hits under both regimes and these are themselves strongly correlated on a global scale (Spearman rank ρ = 0.76). Similarly, winter precipitation and winter temperature are strongly correlated (Spearman rank ρ = 0.64) and yield similar numbers of hits for CS = 0 (Table 1, Additional file 1: Table S2). These pairs of environmental variables may therefore exert similar selection pressures and hence create parallel associations. Interestingly, this does not seem to be the case because among all our hits only 36 are associated with more than one environmental variable at a high significance level (P ≤ 10-5) (Additional file 1: Figure S1). Note that some lack of concordance can be attributed to a threshold effect. Where the true P-values of two climate variables are both 10-5, the stochasticity of randomizations will cause random variation about this expectation and in only 25% of such cases will both climate variables be judged 'hits’. This effect will operate, albeit to a lesser extent, wherever both climate variables yield similar P-values close to the threshold.
To interpret the mode of selection acting at any particular 'genic window/region’, we used the approach of Amos and Bryant . Neutral variability declines linearly with land-only distance from Africa [25, 31–35]. Balancing and directional selection tend respectively to retard and to accelerate loss of diversity, creating slopes that are shallower and steeper, respectively. We therefore calculated the slope of the relationship between heterozygosity and distances from Africa for all genic windows and asked whether our hits exhibit an excess of extreme slopes. An excess of extreme slopes should give the hits a higher variance, and indeed, the slopes obtained for our hit loci do exhibit a significantly higher variance than the slopes of the non-hit loci (F-test, F353,28783 = 1.24, P = 0.0017). We find that our hits show a significant enrichment of extreme negative slopes, indicative of directional selection. Interestingly, our top two hits both show evidence of balancing selection, as indicated by shallower than expected slopes (Additional file 1: Figure S3).
Overlap between our hits, defined as genic windows that yielded a significance of P < =10 -5 with MAGICS and were also significant in the analysis of Hancock et al. (2011)at P < = 10 -3
Gene window number
10.1 and 5.4
Pr Summer and Sr Winter
8 and 5
Sr Winter and T Winter
5.20E + 00
5.00E + 00
6.10E + 00
Numerical overlap between the hits obtained by MAGICS in our study and hits obtained by Hancock et al. (2011)at several significance thresholds
Hancock and this study top 5% overlap
P of overlap
Our hits (P < =10-5)
Overlap between our hits and Hancock’s equivalent rank
This study P < =10-3
Hancock < =10-3
Overlap < =10-3
With respect to specific genes, overlaps between the top MAGICS hits (P < = 10-5) and Hancock et al. hits (P <10-3) are summarized in Table 2. Of note are the appearances of: a) POLD3 (OMIM: 611415) for both winter solar radiation and temperature; b) the psoriasis associated  solute carrier transporter locus SLC12A8 (OMIM: 611316) for winter solar radiation; c) delta-7-sterol reductase gene DHCR7 (OMIM: 602858), the penultimate enzyme of cholesterol synthesis, for winter solar radiation; and d) ADRA2A for both summer precipitation and winter solar radiation. The association between genes involved in metabolism and climate variables describing the degree of cold experienced is consistent with known associations between cold adaptations and metabolism .
We have implemented a novel approach for detecting signatures of recent natural selection in human populations in relation to climate, based on Mantel tests conducted in a general linear model framework. We compare our results with those of Hancock et al. (2011) , who use a Bayesian approach to find correlations between allele frequency and climate. We identify a number of candidate regions that exhibit significant agreement with Hancock et al. (2011)’s  findings, but our use of absolute rather than relative significance highlights order of magnitude differences in the number of hits found for the eight climate–season combinations. For example, summer and winter temperatures reveal associations with 0.013% and 0.27% of the genes, respectively. In terms of numbers of hits, winter solar radiation, winter temperature and summer precipitation seem to have exerted the greatest selective pressure on the human genome.
A perennial issue in any sort of genome-wide scan has been the necessity to control and correct for the large number of statistical tests performed, typically of the order of 106. One way to reduce the problem is to analyze the genome as a series of usually non-overlapping windows and, within each, to maximize the signal by combining multiple, semi-independent measures of selection . Our method offers an alternative way to reduce the multiple hypothesis testing problem. Through use of pairwise matrices, we are able to combine data from tens or even hundreds of SNPs within a genic window to yield a single relatedness value between any pair of individuals or populations. This reduces the number of tests conducted, largely dilutes the impact of occasional extreme outlier SNPs and, following Grossman et al., the use of relatedness effectively captures several consequences of selection that are usually tested separately. Use of similarity matrices further negates the need to fit specific models of inheritance, since regardless of how selection operates, virtually by definition, genetic distance among populations will be either greater (directional selection) or less (balancing selection) than expected based on the rest of the genome. Balancing selection in particular is often not tested for by classical approaches. However, balancing selection is expected to act on small genomic scales; restricting the 50 kb window size used here may improve the ability of MAGICS to accurately detect signals of balancing selection.
Approaches requiring extensive randomization are not usually implemented in genome-wide association (GWA) studies due to the prohibitive number of randomizations required to achieve a level of significance that indicates a locus of interest. Our approach offers two improvements. First, by combining data across multiple SNPs the total number of tests is reduced considerably, with a parallel decrease in the required minimum number of randomizations. Second, and more importantly, we introduce a data-splitting element, with the product of the P-values obtained in each half of the data providing an excellent predictor of overall significance. Through data-splitting, any given number of randomizations can estimate P-values that are approximately the square of the minimum P-value obtainable without splitting. This makes randomization a viable method for assessing significance even for genome-wide analyses.
The association of ADRA2C with cold temperatures (Figure 2) is supported by studies suggesting that the C subunit of the Alpha-2 Adrenergic Receptor is involved in vasoconstriction associated with response to cold . In addition, polymorphisms in both ADRA2A and ADRA2C genes have been reported to play a role in modifications of cold and pain sensibility . ADRA2A and ADRA2C seem to provide a clear example of climate driven selection, where independent regions of the genome have experienced selective pressure in human populations due to their role in cold and pain response. We speculate that, in contrast to the positive selection inferred at most other 'hits’, these loci have experienced balancing selection. Our argument is based on the rate at which heterozygosity declines with geographic distance from Africa in this region of the genome . At both ADRA2A and ADRA2C the slope is appreciably shallower than expected, with ADRA2C being among the top 10% of regions with the most positive slopes, suggesting that selection has acted to retard the loss of diversity that occurred 'out of Africa’ (Additional file 1: Figure S3).
Previous studies  have tended to adopt a pragmatic approach to the interpretation of their results. Instead of trying to interpret the resulting P-values directly, allowing for the number of tests conducted, genetic non-independence between populations with shared ancestry as well as other confounding factors, P-values are ranked on the reasonable assumption that lower P-value on average indicate stronger candidates. Our method differs in that the use of non-parametric randomization allows reasonably objective 'absolute’ P-values to be generated. The approach of evaluating environmental variables separately has also been explored by Fumagalli et al., which also uses partial Mantel tests to examine correlations among genetic and climatic variation using a window-based approach (2011). That this offers a clear advantage is suggested by the contrasting results from the eight climate variables analyzed. Climate variable relative humidity, for example, yields very little evidence of selection, while others, most notably winter solar radiation, winter temperature, and summer precipitation, provide abundant evidence. This pattern seems biologically realistic, in that one would not expect selection to act equally on all variables. In contrast, ranking methods assign equal importance to all climate variables. The large difference between climate variables that we report also helps to validate our approach. One could argue that associations are inevitable given the high degree of autocorrelation between both genetic and climatic data. That some climate variables generate effectively no good hits is important because it suggests that our GLMs successfully control for autocorrelation such that the majority of hits found are genuine.
MAGICS differs in other ways from the method used to detect climate-specific selection signals in Hancock et al. (2011) . In addition to differences in the unit of inference, specifically SNPs versus genic windows, we are also able to ask directly the source of significant associations. Fumagalli et al. (2011)  develop a similar strategy, but Hancock et al. (2011)  achieve this post-hoc. Our method has the flexibility to restrict randomization to subsets of the data. In practice, we start by randomizing across the entire dataset, then test for whole continent effects by restricting randomizations to within continents, and finally test for specific within-continent effects by randomizing within individual continents while holding all other data constant. This analysis sheds light on another facet of the patterns we detected, and may help to distinguish between associations that are due entirely to differences in climate between continents, and those that can arise due to selection in relation to other factors that become spuriously linked to the tested climate variables simply because two or more global regions either share or differ in their climate. Future iterations of our work may incorporate gene expression data, as it has been suggested to drive environmental adaptations in humans .
A concern of any association study is the possibility of false positives. Our results suggest that this issue has been largely mitigated. In isolation, a method prone to false positives due to aspects of the genetic data alone, for example linkage disequilibrium, ascertainment bias or genotyping errors, should generate similar numbers of 'hits’ with all climate variables. This is clearly not the case in our analyses, since relative humidity yields far fewer extreme values compared with all other variables and winter and summer seasons often yield very different numbers. The other concern is that the climate variables themselves tend to be distributed in a way that promotes spurious associations. For example, if people migrated along or against climatic clines, genetic similarity and climate could become correlated. That our winter and summer values usually generate very different numbers of hits despite being correlated with each other, would again seem to indicate that the 'hits’ are largely genuine.
We demonstrate here that MAGICS is a powerful and flexible approach that can be used to identify regions of the genome involved in adaptations to specific environmental variables, isolating them from highly related confounding factors such as geographic distance, and also being able to localize the signals to particular regions of the world. The increasing availability of whole genome sequences of individuals from multiple global populations will provide additional opportunities to carefully study the specific influences of the environment on genomic variation.
We combined published data for 862 individuals belonging to 45 distinct global populations drawn from five published sources [42–46] (Additional file 1: Table S1). Only populations represented by a minimum of nine unrelated samples were included in the analysis, with the exception of the HGDP Colombian population, represented by 7 individuals, to increase the presence of populations from the Americas. We also merged North Italians and Tuscans from the data of Li et al. (2008), to provide a larger sample size for Italians. Several populations from India were merged to form a “South Indian” population.
Gene positions for the implementation of MAGICS were determined using NCBI Build 36.1, University of California Santa Cruz version Hg18. Ensembl gene prediction models recognize a total of 34,156 protein coding genes, RNA genes and pseudogenes in this build of the human genome. Among these, 28,784 were covered by at least one SNP and were used in our analyses. We constructed 'genic windows’ around these genes by taking the midpoint of each gene, calculated as (transcription start + end positions)/2, and the region 25 kb either side of this point. We chose to keep the windows of constant size to minimize bias in favor of larger genes, but acknowledge that cases can be made for a number of other strategies. All comparisons between our data set and that of Hancock et al. (2011)  were limited to the subset of 27,096 genes found in both studies.
Geographic coordinates of populations were taken from published data, where available, or else taken from the midpoint or capital of the country or region of origin. Land-only geographic separation in kilometers were provided by A. Manica . To investigate the role of specific environmental variables on human genetic variation we focused on four climate variables and their variation across two seasons: (1) T, air temperature (measured in °C), (2) Pr, precipitation rate (kg/m2/s), (3) rH, relative humidity (%), and (4) Sr, solar radiation (W/m2). The data source we use reports data in terms of monthly averages across ~40 years of the collection, from the NCEP/NCAR database . To account for seasonal variation, we took the summer and winter seasonal averages. Data extraction from the NCEP/NCAR website (http://ftp.cdc.noaa.gov/Datasets/ncep.reanalysis/) was conducted using an in house Perl script (Climate Manager), available upon request.
MAGICS was implemented using a custom R script (http://cran.r-project.org). To maximize algorithm speed in order to conduct large numbers of randomizations, we exploited the R package Rcpp which facilitates easy incorporation of C++ code snippets. C++ subroutines were written to calculate linearized pairwise dissimilarity matrices from genetic/climate data. Pairwise FST values between all populations were pre-calculated using Weir and Cockerham’s 1984 estimator .
Significance was tested by extensive randomization. We chose to randomize the climate variable, in each case scrambling (sampling without replacement) the climate variable values among populations, recalculating the dissimilarity matrix and refitting the original model. Randomized fits that yielded lower Akaike Information Criterion (AIC) values than the original model were tallied. To maximize algorithm speed and to avoid spending large amounts of time on non-significant genomic regions, randomization number was increased initially up to a maximum of 100,000 until either 10 more extreme AIC values had been obtained from a minimum of 100 randomizations, whichever came first. Note that unless the randomization process ends without finding 10 more extreme AIC values, P-values tend to be slightly conservative due to the fact that they end on a success.
The simplest form of randomization involves scrambling the climate variable evenly across all populations. However, this strategy could increase the chance of weak associations appearing disproportionately strong if a climate value that appears to be found globally is actually unique to a single continent. For a more conservative strategy, we therefore repeated our analysis but this time restricted scrambling to within each of six pre-defined continents: Europe, East Asia, Central and South Asia, Middle East, Americas, and Oceania. Finally, we can also use our approach to ask which continents contribute most to any particular putative association. To do this, once a 'hit’ is found, the analysis is repeated for each continent in turn, scrambling data only for the current continent while holding all other data constant.
Randomization approaches benefit from being non-parametric but an importance disadvantage is that they are computer-intensive. This problem is particularly acute in GWA studies, where the determination of experiment-wide significance requires literally billions of iterations. We therefore explored methods of extrapolation. Trials based on the distribution of AIC values or the proportion of null deviance explained were unsuccessful due to long, poorly defined tails in the distribution of the randomized statistics. A more promising approach is based on data-splitting. If data for a significant regression are divided into two random halves, the P-value of the whole should approximate the product of the P-value of the two halves analyzed separately, an assertion that we verified by simulation. We therefore explored an algorithm in which, for each randomization, the dataset was divided into two random halves, each of which was then analyzed as a separate set of pairwise matrices. Specifically, two random halves are defined afresh at each randomization, used to analyze both the unscrambled and scrambled data, and counts tallied for each half when the AIC of the scrambled data yielded a lower AIC value than the unscrambled data. This approach was then applied to all highly significant 'hits’ (P = 0 in 105 randomizations) but with the maximum randomization number extended to 2,000,000 and with full-dataset randomizations conducted in parallel for calibration.
The results from the MAGICS analyses for all the 28,784 genic windows are available online (http://www.zoo.cam.ac.uk/zoostaff/meg/MAGICSresults.xlsx). Information on the results table are provided in: The link http://www.zoo.cam.ac.uk/zoostaff/meg/README_Raj_et_al_2013.txt.
In MAGICS the coefficients of the GLM should be informative about the nature of the selection acting locally. If populations experiencing contrasting climates also exhibit greater than expected differentiation this would suggest directional selection. However, there is appreciable ambiguity. Where populations in similar climates are more genetically similar than expected this might either indicate directional selection fixing a similar variant or balancing selection reducing divergence through drift. For a clearer picture we therefore turned to a method based on variation in the extent to which variability was lost as humans migrated out of Africa.
Overall, heterozygosity declines with distance from Africa. However, the exact amount of variability lost should be modulated by natural selection, with balancing selection tending to reduce loss and directional selection tending to accelerate loss of heterozygosity. This concept has been used successfully to show that genomic regions that show signatures of selection are enriched for immune genes . Inference of the mode of selection acting on a given genomic location was conducted exactly following Amos and Bryant (2011) , using their data. Briefly, Amos and Bryant (2011) reported the Pearson correlation coefficients of average SNP hetereozygosity and distant from Africa. Data comprise Phase II and III SNP data from HapMap, representing seven different populations and the analysis was conducted with a window size of 50 Kb. An argument could be made to calculate slopes using our data from 45 global populations. However, we elected not to do this for several reasons, including non-independent from other analyses and the fact that Phase I data may be subject to ascertainment bias. By using the data from Amos and Bryant (2011)  we achieve statistical independence from our data with a dataset that has proven ability to detect contrasting patterns of selection at immune-related genes. We therefore applied the same algorithm to derive the correlation coefficient and slope for the relationship between heterozygosity and distance from Africa for each of our genic windows, allowing us to define the mean and standard deviation, and hence to ask whether our putative 'hits’ show independent evidence of selection, and if so, whether the selection is likely to be balancing or directional.
The Supplemental Data section includes three figures and three tables.
Results for all the genic windows are available at: http://www.zoo.cam.ac.uk/zoostaff/meg/MAGICSresults.xlsx. Information on the results table are provided in: The link http://www.zoo.cam.ac.uk/zoostaff/meg/README_Raj_et_al_2013.txt.
The authors would like to thank Yuan Chen for providing gene positions and names, Dr. Andrea Manica for providing geographic distances between populations, and Dr. Tiago Antao for converting the original R-based implementation of MAGICS into one compiled in C++.
S. R. was supported by a Gates Cambridge Scholarship; L. P. is supported by a Domestic Research Scholarship, the Cambridge European Trust and Emmanuel College, Cambridge, UK; I. G. R. is supported by a Sir Henry Wellcome Postdoctoral Fellowship and T. K. is supported by ERC grant Stg-2010 261213. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.