Genetic structure of four socio-culturally diversified caste populations of southwest India and their affinity with related Indian and global groups

Background A large number of microsatellites have been extensively used to comprehend the genetic diversity of different global groups. This paper entails polymorphism at 15 STR in four predominant and endogamous populations representing Karnataka, located on the southwest coast of India. The populations residing in this region are believed to have received gene flow from south Indian populations and world migrants, hence, we carried out a detailed study on populations inhabiting this region to understand their genetic structure, diversity related to geography and linguistic affiliation and relatedness to other Indian and global migrant populations. Results Various statistical analyses were performed on the microsatellite data to accomplish the objectives of the paper. The heretozygosity was moderately high and similar across the loci, with low average GST value. Iyengar and Lyngayat were placed above the regression line in the R-matrix analysis as opposed to the Gowda and Muslim. AMOVA indicated that majority of variation was confined to individuals within a population, with geographic grouping demonstrating lesser genetic differentiation as compared to linguistic clustering. DA distances show the genetic affinity among the southern populations, with Iyengar, Lyngayat and Vanniyar displaying some affinity with northern Brahmins and global migrant groups from East Asia and Europe. Conclusion The microsatellite study divulges a common ancestry for the four diverse populations of Karnataka, with the overall genetic differentiation among them being largely confined to intra-population variation. The practice of consanguineous marriages might have attributed to the relatively lower gene flow displayed by Gowda and Muslim as compared to Iyengar and Lyngayat. The various statistical analyses strongly suggest that the studied populations could not be differentiated on the basis of caste or spatial location, although, linguistic affinity was reflected among the southern populations, distinguishing them from the northern groups. Our study also indicates a heterogeneous origin for Lyngayat and Iyengar owing to their genetic proximity with southern populations and northern Brahmins. The high-ranking communities, in particular, Iyengar, Lyngayat, Vanniyar and northern Brahmins might have experienced genetic admixture from East Asian and European ethnic groups.


Background
The Indian subcontinent is regarded as a natural genetic laboratory, owing to the co-existence and interaction of socio-culturally, linguistically, ethnically and genetically diversified endogamous populations in a geographical terrain. It is believed that the earliest humans leaving Africa for Eurasia might have taken a coastal route across Saudi Arabia, through Iraq, Iran, to Pakistan and finally entered India along the coastlines [1]. A second wave of migration (~10,000 years ago) brought in Proto-Dravidian Neolithic farmers from Afghanistan, who were later displaced southwards by a large influx of Indo-European speakers ~3500 years ago in to the subcontinent [2,3]. The origin and settlement of the Indian people still remains intriguing, fascinating scientists to explore the impact of these past and modern migrations on the genetic diversity and structure of contemporary populations [4][5][6].
Anthropologically, southern and northern populations are distinct and these differences are further substantiated by (i) the presence of Neolithic sites in this region suggests that Neolithic people of southern India came from north by land and the west-coast by sea [7], (ii) the southern megaliths resemble closely with those of the Mediterranean and western-Europe, while those from northern India are similar to megaliths found in Iran and Baluchisthan [8], and (iii) the predominance of Dravidian language in this region as opposed to their secluded occurrence in central Asia and other parts of India, suggests that the Dravidian languages might have originated within India [9]. It is, thus, of considerable genetic interest to understand the genetic structuring and relationships of southern populations.
The present study was carried out on one of the largest southern states, Karnataka, positioned on the southwest coast of India, with a dwelling of about 50 million people. This expanse has been a rich source of prehistoric discoveries dating back to the Paleolithic era that are akin to those seen in Europe [7]. Karnataka has received continuous gene flow from different caste and linguistic groups residing in the adjoining areas of Maharashtra, Andhra Pradesh and Tamil Nadu [10], resulting in the congregation of a large number of diverse endogamous groups within this region. Its large coastline of about 400 Km also attracted the Portugese, Dutch and French traders, who were seeking more profitable ventures on the southern coast at large [2]. Southwest India is, thus, one of the most disparate terrains, with extensive colonization in the past and justifies an in-depth genetic study.
A few studies utilizing classical markers have been carried out on southern populations [5,11,12], including few communities of Karnataka [13,14]. However, sound inferences relating to their genetic structuring and diversity could not be drawn due to low discriminatory power of these markers. Recently, microsatellite markers have gained immense popularity in precisely defining population structure, diversity, affinities, gene flow and other crucial aspects associated with population genetics [15][16][17][18][19][20][21] because of the relative expediency, with which a large number of loci and alleles can be typed, facilitating the accumulation of vast data sets that can be readily analyzed with an extensive array of statistical tools [22,23]. These markers also demonstrate high heterozygosity [24], rendering them highly suitable for carrying out the present study.
Among the different caste and tribal groups inhabiting the southwest coast of India, we have selected four predominant Dravidian-speaking communities from Karnataka: Iyengar Brahmin, Lyngayat, Gowda and Muslim, they not only belong to dissimilar groups of the Indian caste hierarchy but also have varied migration histories, conferring them uniqueness and significance from a genetic perspective. The present microsatellite study primarily attempts to understand the genetic structure of the four selected populations and to determine their genetic relationship with other linguistically and ethnically similar groups of southern India and Brahmin groups of northern India. It has been suggested that that despite the linguistic homogeneity in southern India, these populations have remained genetically diversified [25]. Hence, we sought to determine the role played by geographical location and linguistic affiliation in genetically differentiating Indian populations. Also, as mentioned earlier, the western coast has witnessed colonization from different world populations, we aim to divulge the impact of these past migrations on the gene pool of the present southern populations by discerning their relationship with historically acclaimed and established migrant groups, ethnically represented by European, Hispanic, East Asian and African populations.

Results
Allele frequency at 15 STR was used to compute the heterozygosity (observed) for the four studied populations, which varied for each locus, and population but reflected similar values, ranging between 0.724 and 0.797 (Table  1). An average G ST value of 0.009 elucidates the low degree of genetic differentiation in them. However, the G ST value for the pooled Indian and global populations demonstrated a high value at 2.3% (data not shown). Genetic relationship of studied populations with other similar southern groups; Vanniyar, Gounder, Pallar and Tanjore Kallar [26,27], northern Brahmins belonging to Orissa [28] and Bihar [29], and four relevant global ethnic groups: European, Hispanic, African [30] and East Asian [31] was divulged by computing DA distances (Table 2) and represented using NJ tree (Fig. 1). Among the four studied populations, Iyengar, Gowda and Muslim formed a distinct cluster. Although NJ tree clearly depicts the clustering of southern populations, D A distances indicate that among these groups, Iyengar, Lyngayat and Vanniyar are more similar to the northern Brahmins (0.030). Furthermore, genetic distances emphasize the affinity of Lyngayat with Tanjore Kallar (0.029), Iyengar (0.026) and Vanniyar (0.028). Estimation of relatedness between the southern and global populations shows that all the southern communities formed a separate cluster, nevertheless, genetic distances disclose the affinity of upper caste Indian com-munities; Iyengar, Lyngayat, Vanniyar, Bihar and Oriya Brahmin with Europeans and East Asians. The Indian populations were most distant to Africans.
The regression model (Fig. 2), of mean per locus heterozygosity against distance from centroid assumes that when a population experiences same amount of gene flow from a homogenous source, a linear relationship exists between the expected and observed heterozygosity. A change in gene flow directly affects this linear relationship. The Rmatrix when applied to the Indian populations assists in   Table understanding the influence of external gene flow and admixture among populations. The higher observed than expected heterozygosity of Iyengar and Lyngayat, placed above the theoretical regression line helps infer that these populations have received more than average external gene flow, which was also observed in Vanniyar, Pallar and Oriya Brahmin. The Gowda and Muslim groups exhibit lower than expected heterozygosity values and fall below the regression line, suggesting lesser admixture in them.

Abbreviations used in
The microsatellite diversity computed using AMOVA revealed that the genetic variation observed in Indian populations was mainly confined to variation amongst individuals (~98%), irrespective of their geographic or linguistic grouping ( Table 3). The geographical clustering of populations into three regions: north, southwest (Karnataka) and southeast (Tamil Nadu) demonstrated a low variance of 0.29%, p = 0.010 (Table 3a). As compared to geographical grouping, the linguistic clustering (Indo-Caucasian and Dravidian) exhibited a noticeable increase in the molecular variance between the two groups, 0.65% (p = 0.06, Table 3b). The genetic diversity among populations within each group remained almost similar at both levels of analysis.

Discussion
In recent years, population genetics has witnessed extensive use of microsatellite markers to understand and evolutionary histories of contemporary human populations [17,[32][33][34]. Though, the populations inhabiting south India have played a major role in formation of the Indian gene pool, however, very few genetic studies have been carried out on them. The present study utilizes 15 STRs to provide comprehensive genetic information on four predominant communities inhabiting the southwest coast of India, which may significantly help in understanding the genetic composition of southern populations.

Genetic structure of Karnataka populations
The most distinctive feature revealed by the fifteen microsatellites was the considerable genetic homogeneity amongst the four diverse caste groups residing in southwest India. The presence of an almost similar allele frequency pattern [34], suggests that these populations might have a common ancestry or probably experienced very high gene flow during the period of their coexistence. The above finding is further supported by the low genetic differentiation of 1.0% among the studied groups irrespective of their caste and migration histories. The high heterozygosity and rii values in Lyngayat reflect the admixture and stochastic processes experienced by it. The genetic affinity of Lyngayat with other related southern caste populations, like, Iyengar, Vanniyar and Tanjore Kallar reiterates its heterogeneous past. It is noteworthy that although the southern populations exhibited higher affinity amongst each other, the high-ranking populations, like, Iyengar, Lyngayat and Vanniyar also displayed some genetic similarity to Brahmins from Bihar and Orissa, indicating that the gene pool of Iyengar and Lyngayat probably consists of genetic inputs from both southern and northern groups. However, strong conclusions cannot be drawn due to low genetic differentiation among the studied populations. Though the Gowda is known to have moved in to Karnataka from the adjoining area of Tamil Nadu, our study reveals that Gowda cluster with the studied populations and not with Tamil groups. The low hetetozygosity and high rii values of Gowda implies that it might have differentiated as a result of stochastic processes. Furthermore, the relatively lower heterozygosity and admixture levels of Gowda and Muslim might be attributed to the socio-cultural practice of consanguineous marriages in them. The Muslim group was found to be genetically similar to local populations. Regional conversions from diverse castes that occurred during the period of Islamic dominance might elucidate the more or less identical genetic relationship between Muslims and other studied groups. The microsatellite study emphasizes the genetic similarity among the Karnataka populations, with the lack of any strong caste or religious bias in them.

Analysis of genetic variance
AMOVA test strongly suggests that genetic diversity among the southern populations was mainly confined to intra-population variation, further emphasizing the genetic homogeneity in them. Analysis using different genetic markers corroborate with our finding that the genetic diversity in human populations can be mainly attributed to variation within populations [4,17,19,34,36,37].
An exploration of the genetic differentiation based on geographical grouping of populations discloses the genetic similarity among populations residing in a region. Nevertheless, the geographic affinity was comparably lesser to that observed within the two linguistic families, viz., Dravidian and Indo-European. Our finding provides evidence to the strong linguistic affinity prevailing amongst the Dravidian speaking populations and imparts them genetic distinctness from the Indo-European linguistic group. Even though prior studies have indicated that genetic clusters often correspond closely to predefined regional and linguistic groups [34], AMOVA suggests that caste system along with geographical contiguity are not ideal platforms for differentiating the analyzed Indian populations. It must, however, be acknowledged that use of less number of polymorphisms in this study might plausibly have led to the greater influence of linguistic affiliation on these populations rather than geographical proximity.

Genetic affinity with global populations
The genetic differentiation of the studied populations with relevant global migrant groups was estimated to be 2.3%, relatively lower than the 9% observed in another similar study [16], which had used a different set of microsatellite markers. Sampling from a confined area, as well as the use of lesser number of loci might have contributed to this apparent difference in the results. The southern populations formed a separate cluster from the world populations. Molecular studies on Indian populations using diverse markers (nuclear, mtDNA and Y-chromosome) have demonstrated that the upper caste populations have higher semblance with Europeans than Asians [26]. Intriguingly, in the present study, communities belonging to the upper strata of the Hindu caste hierarchy, i.e., Iyengar, Lyngayat, Vanniyar and northern Brahmins, displayed almost identical genetic affinity with both Europeans and East Asians. Therefore, all though it is believed that south India remained isolated and cushioned from the foreign invasions, the southern populations, especially, the high-ranking groups might have genetically admixed with migrant groups that entered via the west coast and north. Further exploration of their rela-  tionship is essential before drawing concrete conclusions. A more comprehensive picture would emerge on analysis of mtDNA and Y chromosome markers.

The populations
The populations selected in this study comprise of three major Hindu castes-Iyengar, Lyngayat, Gowda and a Muslim community, inhabiting the southwest coastal terrain of Karnataka (11.3 -18.45°N latitudes and 74.12 -78.40°E longitudes). All the populations belong to the Dravidian linguistic family and are speakers of the local dialect, Kannada, but differ in caste hierarchy and socioreligious practices. Consanguineous marriages have been reported in Karnataka, with inbreeding levels of the order 0.020 to 0.033, in general [38].
Iyengar hold a high position in the Indian caste hierarchy and sporadic accounts on Brahmin, suggests that they primarily migrated from the upper Gangetic plains to southern India. Nonetheless, few bioanthropological studies have revealed that morphologically Brahmins of a geographical region are similar to the local groups.
Lyngayat community was initially formed, as a religious cult by the amalgamation of people from different castes and geographical regions but later developed into a distinct community practicing strict marriage endogamy with social sub-divisions such as clans, sub-castes and sects [10].
Gowda is a low ranking agriculturist caste group that typically exhibits the Dravidian socio-cultural characteristic of consanguineous marriage. It is believed to have moved in from the adjoining area of Tamil Nadu.
Muslim is a linguistically heterogeneous, complex religioethnic group, [10]. It is believed that the invasion of Turks, Afghans (A.D 998-1030) and Moghals during the 15 th century, introduced new genes only in northern India, suggesting that Muslims from Southern India are mainly local converts [3].

Micosatellite loci studied
The 15 STR marker set analyzed in this study consists of thirteen tetra nucleotide repeat loci: D3S1358, THO1, D21S11, D18S51, D5S818, D13S317, D7S820, D16S539, CSF1PO, vWA, D8S179, TPOX, FGA and two penta nucleotide repeat loci: Penta D, Penta E. Their repeat size makes them less prone to slippage of polymerase during enzymatic amplification compared to the dinucleotide repeats, allowing unambiguous typing [20]. The 15 selected loci are situated on 13 different chromosomes, with D5S818 and CSF1PO being present on chromosome 5 and Penta D and D21S11, located on chromosome 21. The alleles across the loci are substantially unlinked, making them suitable for analyzing inter and intra-population genetic diversity.

Statistical Analysis
Allele frequencies of the 15 STR loci were calculated using the gene counting method [40]. The genetic diversity (G ST ), observed heterozygosity and pairwise genetic distances (DA) were computed using allele frequencies [42]. The DA distance is least affected by sample size and can precisely obtain correct phylogenetic trees under various evolutionary conditions [43]. Neighbor-joining trees were constructed using DA distances [44], and its robustness was established by bootstrap resampling procedures.
Analysis of molecular variance (AMOVA) was performed using the Arlequin Ver. 2.00 package [45]. Two levels of analysis were performed to explore the microsatellite diversity among the four studied populations along with six other socio-culturally similar groups inhabiting different regions of India. At the first level, three geographical groups were constructed: (1) north (2) southwest: Karnataka and, (3) southeast: Tamil Nadu, to estimate the genetic variance among populations from diverse geographical regions. The second set of analysis was aimed at investigating the genetic diversity between the Dravidian and Indo-European linguistic family.
To assess the gene flow experienced by these populations, the rii value, i.e., the genetic distance of a population from the centroid was calculated using the regression model [46]. This model utilizes the heterozygosity of each population and the distance from the centroid as the arithmetic mean of allele frequencies: where, r ii is the distance from the centroid, p i is the frequency of the allele in i th population and is the mean allelic frequency.

STR -Short Tandem Repeat
AMOVA -Analysis of Molecular Variance NJ tree -Neighbor-Joining tree Authors' contributions RR carried out the molecular studies, analyzed the genetic data and drafted the manuscript. VKK participated in the design, conceiving and preparation of manuscript. Both authors read and approved the final manuscript.