Successful genotyping of 307 SNPs in 400 P. falciparum infections from four provinces in the Colombian Pacific region over a ten year period allows detailed description of parasite population genetics in this region. These data reveal (a) parasite MLGs persisting for up to eight years (median of 538 days), (b) stratification of parasites into four subpopulations that occur sympatrically within sampling locations, (c) LD decays by half in <10 kb, but varies between subpopulations. We discuss the advantages of the SNP genotyping method used, and the implications of our findings for design of association studies and evolution of drug resistance in low endemic malaria areas.
Effective genotyping using dried blood spots
In the present study, we were able to rapidly score multiple markers from finger prick blood spots with high reproducibility. The SNP markers selected provide a set of 250 informative SNPs for further genetic studies at the local/regional level. The major advantage of SNPs over microsatellites is that they are more abundant, mutationally stable, located in genes and “portable”; in other words, they are easily scored and comparable between studies.
Clonality and persistence of MLGs in time and space
Genetic variability studies show a direct relationship between degree of parasite endemicity and genetic variation [32, 33]. In high endemic malaria areas, such as sub Saharan Africa, multiclonal P. falciparum infections, high genetic diversity, and low LD are common. In contrast, in low endemic areas, such as South American countries, malaria patients are expected to have infections caused by a single clone with limited genetic diversity and more extensive LD . We found that 19% of P. falciparum infections were multiclonal in samples collected from the Colombian Pacific region. This is consistent with previous studies in low endemic malaria areas and contrasts with studies in Africa where the percentage of multiclonal infections can reach up to 90% . This correlation between multiclonal infections and transmission intensity was confirmed here within low endemic areas (Figure 3), suggesting that this metric provides an indirect measure of transmission intensity [35, 36].
A decade ago, Anderson et al., using 12 microsatellites, stated that P. falciparum from South America had the lower level of genetic diversity worldwide, with 30 Colombian samples (collected in Antioquia province), showing the lowest diversity . Another study using 56 samples from Chocó and five polymorphic microsatellites, also suggested low diversity .
The genotypic richness (R)  was 0.42 for all the monoclonal samples included in this study, the lowest reported in comparison with other studies from similar malaria eco-epidemiological features such as Venezuela, Peru, Brazil, Cambodia and Thailand with R values of 0.60 – 0.98 [9, 11, 17, 18, 37–39]; however this measure is strongly influenced by sampling intensity and hence comparisons between countries may be biased . Our study confirmed low genotype richness in P. falciparum from Colombia, with a third of the MLGs infecting ≥2 patients (Figure 4A) and long persistence (Figure 5A) in cities separated up to more than 500 km (Figure 1). Our results contrast with studies from neighboring countries including Venezuela, Brazil and Peru, where genotype richness is markedly higher and the number of polyclonal infections has increased (more than double) from 2003 to 2007 [9, 38, 40].
Implications for drug efficacy studies
PCR genotyping of parasite infections before and after treatment is widely used to differentiate between reinfection and recrudescence and to adjust measures of treatment failure rates in antimalarial drug efficacy studies [41, 42]. However when parasite populations are highly inbred, there is high probability of patients being reinfected with the same parasite genotype . To evaluate this probability in Colombian samples, we examined the probability of sampling identical genotypes. In Valle province, this probability was the highest, during the interval of 15–21 days (~30%), followed by Chocó (29–42 days) and Nariño (22–28 days) with a probabilities close to 15% and Cauca (43–63 days) ~10%. The overall population mean probabilities were between 3 - 11% for up to 500 days (Figure 5C). These results suggest that PCR evaluation should be used with care in Colombia, because there is a strong probability of misclassifying some new parasite infections as recrudescences, thereby overestimating treatment failure rates (Figure 5C).
Colombia implemented Artemisinin Combination The-rapies (ACTs) at the end of 2006. Three drug efficacy studies were performed with these compounds in Antioquia and Chocó provinces, showing 99 to 100% of efficacy[44–47]. One subject from the rural area of Tadó (in Chocó) presented parasitemia and fever at day 28 post treatment with Coartem®; further genetic analyses of the msp1 gene suggested a recrudescence . However, it is unclear whether this was a true treatment failure, or a case of reinfection with the same genotype. The use of even more polymorphic markers is not going to overcome the limitations of PCR in this scenario. Three alternative approaches could be used to aid interpretation of such studies: i) the use of statistical approaches designed to account for this bias, ii) definition of primary efficacy in terms of parasite clearance rates and iii) the use of “malaria-free locations” for malaria patients during post treatment surveillance [43, 48].
Strong genetic structure in the Colombian Pacific
Both allele sharing methods and Bayesian clustering define four subpopulations and mixed ancestry in the area of study (Figure 4), suggesting that our population structure results are robust. This is in line with previous studies performed in Brazil and Peru, where three to five subpopulations were revealed [9, 10, 39]. The presence of para-site subpopulations in the Colombian Pacific coast may be partially explained by the bottleneck in Plasmodium populations, approximately 9,000 cases in 1960, produced by the implementation of malaria control strategies  and subsequent focal reemergence of parasites with different genetic backgrounds.
The coexistence of different sub-populations within locations is consistent with limited genetic exchange, together with extensive migration among locations.Plasmodium falciparum genetic interchange in Colombia was suggested recently for parasites through the Andean mountains and North and South of the Pacific region . Identification of identical MLGs in different sampling locations also demonstrates migration of parasite genotypes without breakdown due to recombination. An epidemiological study performed in Quibdó with 670 P. falciparum infected patients, revealed that 66% of the cases are from the urban and rural area of the city, while the 33 and 1% are from neighboring municipalities and provinces, respectively .
Local adaptation to different vectors may play a role in the parasite population structure. For example, coadaptation between vector and parasite has been suggested for mosquitoes and Plasmodium vivax in Mexico , where subpopulations of parasites differentially infected Anopheles albimanus and An. pseudopuntipennis. Three primary and three secondary vectors are found in the Pacific region (Table 1), and at least five different ecological sub-regions had been identified (http://www.eoearth.org/). Anopheles populations in Colombia vary locally in their vectorial competence, breeding habitats, and feeding preferences . One possible explanation is that parasite population structure in the Pacific region is also shaped by geographic restriction of compatible vectors. For example, parasites from Col-1 subpopulation may be adapted to An. darlingi in Chocó, since this vector had not been registered in the other provinces of the Colombian Pacific region [13, 14]. Further experimental investigation is necessary to test this hypothesis. The presence of An. darlingi in Chocó, could explain the higher number of multiclonal infections (25%), as this species is considered the most effective malaria vector in Latin America [13, 14].
The model of metapopulation structure in P. falciparum suggests the potential for spreading of drug resistance alleles [52, 53]. Parasites from Colombia may follow this model as they show no panmixia and inbreeding. This fact highlights the need to closely monitor the efficacy of ACTs in Colombia and neighboring countries, since emergence of drug resistance to different antimalarials occurred and disseminated rapidly in this region . Artemisinin resistance has been confirmed in Southeast Asia , an area with similar low transmission conditions as South America. Finally, the presence of P. falciparum subpopulations in the Colombian Pacific region could explain the different patterns of drug susceptibility (in vivo and in vitro), as the magnitude of resistance to amodiaquine, sulphadoxine-pyrimethamine, and mefloquine varies between the South and North of this region [50, 55].
Implications for association studies
Association studies require considerable investment of time and resources. Therefore, it is critical to first demonstrate that the traits of interest have a genetic basis and to quantify the heritability in order to calculate appropriate sample sizes [17, 18]. Colombian P. falciparum populations are well suited to study the heritability of a trait of interest as the estimation of this parameter is achievable when identical clones in populations have been identified.
Colombian P. falciparum samples represent a challenge for association studies owing to strong population structure, and presence of many identical or closely related genotypes. In this study there were 136 unique genotypes among 400 parasites sampled. Hence, only a third of parasites sampled would be informative for association analyses. Both population stratification and cryptic relatedness can generate spurious associations . On the other hand, the low numbers of multiple clone infections simplifies detection of genotype/phenotype associations. Both sampling and statistical approaches can minimize bias in this situation. A two phase sampling strategy provides one possible approach to minimize cost and effort, while maximizing study power. In phase one, preliminary genotyping of the parasite population using 96–384 SNPs can rapidly identify identical clones and multiple clone infections. In phase two, a single representative of each clone can be genotyped using higher densities of SNPs using Illumina sequencing  or microarray based approaches  and characterized for the trait of interest. From the statistical standpoint, powerful mixed model approaches developed by plant geneticists allow for effective control of both stratification and cryptic relatedness . This methodology was recently used to establish the role of the PF10_0355 membrane protein in low susceptibility to arylaminoalcohols antimalarials . These approaches must be considered in areas of low transmission for P. falciparum association studies.
Offspring with limited to zero recombination are expected to occur in South America . Subpopulations Col-3 and Col-4 exhibit the lower genotypic richness (R) with R ~ 0.21, leading to higher LD in comparison with the other populations. Subpopulations Col-1 and Col-2 both with R of ~ 0.59, are likely older and have more likelihood of experiencing recombination. In our samples we estimated persistence up to 48 generations, which reflects transmission over many generations of segments of ancestral haplotypes comprising linked markers.
Despite evidence for high levels of inbreeding, the extent of LD observed was lower than observed in other studies of South American populations. We observed mean r2 value of 0.16 between markers spaced <10 kb apart in the whole data set (Figure 6), while Neafsey et al. 2008 reported mean r2 of 0.7 for markers spaced <10 kb apart for samples from Brazil . Strong artifactual LD can be generated by combining subpopulations with differing allele frequencies in a single population sample. We therefore expected that LD would be elevated in the total population relative to the individual subpopulations. In fact, we observed the opposite (Figure 6).
The extent of LD varies among the four subpopulations. Mean r2 for intermarker distances <10 kb range from 0.15 for Col-2 to 0.52 in Col-3 (Figure 6). Linkage disequilibrium in Col-3 and Col-4 also decays to background levels (r2 between markers on different chromosomes) at ~500 kb, more gradually than for Col-1 and Col-2 (Figure 6). Mixing with other parasite populations could also explain the rapid decay in LD in the Col-2 subpopulation. The Col-2 subpopulation is dominant in Buenaventura (Valle State) the most important Colombian port in the Pacific Ocean. The extensive movement of people through this port may increase the chance of parasite admixture.
Several factors may contribute to the elevated LD observed in Col-3 and Col-4. First, these subpopulations are small (n = 14 and 21 for Col-3 and col-4 respectively) so the extended LD may be an artifact of low sample size. Three observations are consistent with this. First, background levels of LD (between unlinked markers on different chromosomes) are much higher in Col-3 and Col-4 than in Col-1 and Col-2. Second, random resampling of 14 haplotypes from the total sample of unique genotypes (n = 136) increased values of r
by an average of 0.06 in each distance category. Third, relatedness or recent admixture may contribute to elevated LD in Col-3 and Col-4. These subpopulations show lower expected heterozygosity (H = 0.25 for Col-3 and H = 0.21 for Col-4) compared with Col-1 (H = 0.27) and Col-2 (H = 0.34), suggesting that they may contain closely related parasites. Finally, genotypic richness is lower in Col-3 and Col-4 (R = 0.19 and 0.25 respectively) compared with Col-1 and Col-2 (R = 0.57 and 0.61 respectively). We suggest that observations of differences in LD decay within parasite subpopulations should be viewed with caution unless artifactual effects of sample size and relatedness can be clearly rejected.
A feature of the LD information is important for association mapping. Only 2.4% of markers situated within 10 kb show r2 ≥ 0.8 in the whole dataset. Hence, even in this low transmission region, genome sequencing or efficient genotyping of tagging SNPs will be needed to avoid false negative associations. On the other hand, the rapid decay in LD should enable localization of causative SNPs to genome regions containing 1–5 genes.