Characterizing the genetic differences between two distinct migrant groups from Indo-European and Dravidian speaking populations in India

Table 1 Discovery and validation criterion for differentiated genomic regions

Criteria	Discovery criterion	Validation criterion
F_ST Region with an over-representation of SNPs possessing high F_ST values relative to the genome-wide distribution of F_ST scores	Regional evidence in the top 0.1% of the genome-wide distribution, in which:	Discovered region should contain evidence found in the top 1% of the genome-wide distribution
	- Regions are defined by window sizes of 100 kb and 500 kb;
	- Evidence is defined by the P-value of the exact Binomial test for the proportion of SNPs with F_ST in the top 1st percentile (100 kb) or 0.1st percentile (500 kb) respectively of the genome-wide distribution score
Differential iHS signals for GIH and INS	At least one SNP with normalized iHS score in the top 0.19% of the genome-wide distribution in one population, but not present in the top 1% of the genome-wide distribution in the other population	At least one SNP in the discovered region should have an iHS score in the top 1% of the genome-wide distribution, but absent in the top 1% of genome-wide distribution of iHS scores in the second population
XP-EHH between GIH and INS	Normalized XP-EHH scores should lie in the top 0.01% of the genome-wide distribution	At least one SNP in the discovered region should lie in the top 0.5% of the genome-wide distribution of the normalized XP-EHH scores

A description of the population genetics metrics used to discover and validate genomic regions that are differentiated between the north Indian Gujarati population (GIH) and the south Indian Tamil population from Singapore (INS).
Abbreviations: iHS integrated haplotype score, XP-EHH cross-population extended haplotype homozygosity.

ISSN: 2730-6844