The present study showed that a group of older adult Puerto Ricans living in the United States had significantly different allele frequency distribution in 101 single nucleotide polymorphisms than similarly aged NHW. Moreover, Puerto Ricans had lower frequency of protective alleles and greater prevalence of risk alleles for SNPs associated with several chronic diseases. Although population differentiation values (FST) did not differ between the two populations, the patterns of FST differed by SNP function in the populations. Puerto Ricans showed exceptional FST values in intronic, non-synonymous and promoter SNPs, while NHW had exceptional values in intronic and promoter SNPs only. These exceptional FST values suggest that selection may be present in genes harboring those SNPs .
The differences in allele frequency and population differentiation between the two studied populations may seem surprising as Puerto Ricans comprise an admixed population with European, African and Native American (Taíno) heritage . The value of 45.5% of SNPs showing difference in MAF between the two populations reported here is higher than such differences reported in other populations possibly more distant than Puerto Ricans and NHW. Bamshad et al. found that 41% of SNPs differed significantly between African Americans and European Americans . Moreover, Burchard et al. reported that studies examining the allele frequency distribution among racial groups had found differences of 15% to 20% between groups . Those authors argue that differences in frequency among ethnic groups are typically not as large as those we observed. It is possible that the origins of the white component of the two populations studied here contribute to some of the observed allele frequency differences. The NHW population shares ancestry primarily with individuals from northern and western Europe, while the Puerto Rican population shares ancestry primarily with peoples of the Iberian Peninsula. To determine the contribution of ancestral populations, we calculated the predicted MAF for Puerto Ricans under an admixture model. Close to a third of the SNPs with available data on predicted MAF showed significantly different allele frequencies from our Puerto Rican sample, suggesting that for those loci, the frequencies cannot be explained by admixture alone. However, for the majority of the analyzed SNPs showing similar observed and expected MAF, each ancestral population may be contributing unequal allele frequencies, supporting our conclusion on the large difference between our Puerto Rican sample and the NHW population. The observation that the MAF in Puerto Ricans tended to be lower for protective alleles and higher for risk alleles in disease-associated SNPs is notable. This could be an important factor in the disparities in chronic diseases observed in Puerto Ricans.
Within genomic regions, other groups have found higher MAF in non-coding regions than in coding regions, reflecting the deleterious effect of mutations in coding regions, with promoter regions having the highest MAF . Our observations agree with these previous results, where non-synonymous SNPs have lower mean MAF than SNPs with other functions. We obtained very low FST values for both populations, suggesting very little population differentiation. To our knowledge, this is the first report of population differentiation for Puerto Ricans. FST estimates for other racial groups have been reported in a wide range from 0.009 in Ashkenazi Jews and HapMap European populations  to 0.11 for SNPs across four HapMap populations . The variations are most likely due to differences in sample size and tested markers.
When assessing exceptional FST values by functional categories of the SNP, our study shows that Puerto Ricans had exceptional FST values in intronic, non-synonymous and promoter SNPs. Analysis of data from NHW exhibited a preponderance of exceptional FST values in SNPs mapping to intronic and promoter regions. In secondary analysis, Puerto Ricans showed a prevalence of exceptional FST SNPs mapping to chromosome 11. Interestingly, chromosome 11 includes four clustered apolipoprotein genes (APOA1, APOA4, APOA5 and APOC3), for which several gene-environment interactions have been reported. In a study with a Puerto Rican sample, Tang et al. showed an excess of Native American ancestry in chromosome 11, which also harbors an olfactory gene cluster that is a target of ongoing positive selection . Kullo and Ding showed high FST values for SNPs in genes of the blood circulation and gas exchange pathways as well as the lipoprotein metabolism pathway in African versus NHW or Asian populations . These observations agree with the prospect that SNPs with high or exceptional FST values should be considered priority candidates for studies on association with complex diseases and local adaptation to environmental influences. Future studies in these populations should focus on intronic, promoter and non-synonymous SNPs for Puerto Ricans, and promoter and intronic SNPs for NHW.
There are limitations to this study. First, the tested SNPs were not randomly selected and thus the frequencies and FST measures may not represent actual distributions, as selective pressures may cause over- or under-representation in disease-associated genes. However, it has been shown that disease-associated SNPs do not show greater population differentiation than SNPs chosen at random . Here, a limited number of tested markers yielded results that could be due to a particular selection. Additionally, statistical power might be limited when stratifying by SNP function (or by chromosome in secondary analysis). Nonetheless, other studies have reported similar measures using fewer SNPs [23, 24]. Furthermore, FST values can also indicate genetic drift and bottleneck effect of the populations. Conclusions about selection based on single marker FST values should be interpreted with caution . Still, the FST estimates provide a practical portrayal of the adaptation processes of Puerto Ricans and surely merit further research. Alternatively, genome-wide genotyping using microarrays will allow detection of selection signal at a genome scale based on an excess of ancestral divergence between disease and non-disease groups . It will also be useful to increase the list of tested markers and of biological pathways, as well as to expand into other genetic measures such as linkage disequilibrium in this Puerto Rican cohort. Finally, we defined risk (or protective) allele based on studies on other, often non-Hispanic, populations assuming these alleles would exert the same impact in Puerto Ricans. The amount of risk or protection conferred by a given allele or the extent of change in a biomarker of disease status may be different in Puerto Ricans, and further association studies should be done to corroborate this.
Alternatively, our study has several strengths. First, noting that there are fluctuations in allele and genotype frequencies in control populations by age and sex , we sought to compare the two populations using data from individuals within the same age range. We did not observe any differences by sex. Second, our study has a fairly large sample size. Taioli et al. showed that the deviation of the estimate of allele frequency calculation from the true frequency decreases with increasing sample size, with sample sizes above 500 having almost no deviation . The same principle applies to FST estimates. Third, Akey et al. demonstrated that the average FST values could be affected by genotyping error rates as low as 2–5% , but our methods, validated repeatedly, produce less than a 1% genotyping error rate, greatly decreasing the impact of such errors in the FST estimates.
The results of this study have implications in the generation of new hypotheses and selection of candidate genes for association studies and gene-environment interaction studies. Allele frequency variations and FST patterns guide the identification of markers that may have been more prone to local adaptation by past environmental influences that contrast sharply with the environmental pressures faced by Puerto Ricans living in Boston, MA at the present time, namely poor nutritional habits . Such markers are more likely to respond to environmental signals in a manner that manifests as an association with disease status or progression. The observed differences in allele frequency and population differentiation for disease-associated SNPs may explain the disparities in disease prevalence observed in Puerto Ricans. For example, it has been shown that carriers of the variant allele of the Pro12Ala polymorphism in the PPARG gene have a beneficial effect on the measures of glucose metabolism with higher fish intake . Taínos (the Amerindian heritage of modern Puerto Ricans) consumed a diet based on fish, tropical fruits and vegetables, and some small animals . Today, major contributors to the Puerto Rican diet are rice, starchy roots, milk, fried meat products, and processed, Westernized foods, with little consumption of fish [46, 47]. Interestingly, Puerto Ricans from our cohort had significantly higher frequency of this minor allele than NHW; yet the beneficial effect may have been lost with the low consumption of fish in this population today. This warrants assessment of the contribution of this, and other, polymorphisms to the high prevalence of diabetes in Puerto Ricans [4, 48].
Clearly, the goal is to assess genetic and/or environmental contribution to disease to help solve these disparities with targeted interventions and not to create any bias against the populations under study based on genetic profile. Several examples of such ethnicity-based prospective interventions and drug therapies have been reported, such as the population-specific drug-metabolizing capabilities due to various polymorphisms in the cytochrome P450 (CYP450) enzymes . Finally, this study does not define Puerto Ricans on a genetic level, as they are already a well-defined ethnic group with distinct cultural and social norms and environmental influences, even while sharing some characteristics with other Hispanic subgroups. Further studies on genetic and environmental determinants of disease by ethnic or racial groups should recognize the unique genetic and cultural traits of Puerto Ricans.