Genetic affinities between the Yami tribe people of Orchid Island and the Philippine Islanders of the Batanes archipelago

Background Yami and Ivatan islanders are Austronesian speakers from Orchid Island and the Batanes archipelago that are located between Taiwan and the Philippines. The paternal genealogies of the Yami tribe from 1962 monograph of Wei and Liu were compared with our dataset of non-recombining Y (NRY) chromosomes from the corresponding families. Then mitochondrial DNA polymorphism was also analyzed to determine the matrilineal relationships between Yami, Ivatan, and other East Asian populations. Results The family relationships inferred from the NRY Phylogeny suggested a low number of paternal founders and agreed with the genealogy of Wei and Liu (P < 0.01). Except for one Y short tandem repeat lineage (Y-STR), seen in two unrelated Yami families, no other Y-STR lineages were shared between villages, whereas mtDNA haplotypes were indiscriminately distributed throughout Orchid Island. The genetic affinity seen between Yami and Taiwanese aborigines or between Ivatan and the Philippine people was closer than that between Yami and Ivatan, suggesting that the Orchid islanders were colonized separately by their nearest neighbors and bred in isolation. However a northward gene flow to Orchid Island from the Philippines was suspected as Yami and Ivatan peoples both speak Western Malayo-Polynesian languages which are not spoken in Taiwan. Actually, only very little gene flow was observed between Yami and Ivatan or between Yami and the Philippines as indicated by the sharing of mtDNA haplogroup B4a1a4 and one O1a1* Y-STR lineage. Conclusions The NRY and mtDNA genetic information among Yami tribe peoples fitted well the patrilocal society model proposed by Wei and Liu. In this proposal, there were likely few genetic exchanges among Yami and the Philippine people. Trading activities may have contributed to the diffusion of Malayo-Polynesian languages among them. Finally, artifacts dating 4,000 YBP, found on Orchid Island and indicating association with the Out of Taiwan hypothesis might be related to a pioneering stage of settlement, as most dating estimates inferred from DNA variation in our data set ranged between 100-3,000 YBP.


Background
Orchid Island, is located 49 nautical miles from the southeast coast of Taiwan along the Bashi (or Luzon) channel in the Pacific Ocean, and is home to the Yami tribe (also known as Tao). The Ivatan tribe people are inhabitants of Itbayat in the Batanes archipelago which is south of Orchid Island (Figure 1). The languages of Yami and Ivatan belongs to the Batanic sub-branch of western Malayo-Polynesian languages (Figure 1), which also belongs to the 10 th branch of the Austronesian (AN) languages group [1,2]. The Yami are the only non-Formosan Austronesian speakers among Taiwan Aborigines (TwA) [3]. They also have a close cultural relationship with the Ivatan. According to an oral folk tale, the Yamis believe that their ancestors came from the Batanes archipelago [4].
The archaeological findings in Orchid Island have shown evidence of Fine Corded Ware Culture, which is related to the Peinan culture [5]. These middle Neolithic artifacts were found on the east coast of Taiwan between Hualien and Taitung ( Figure 1). These findings indicate contact and possibly migration from Taiwan to Orchid Island~4,000 years before present (YBP). Furthermore, the post-Neolithic oral history (~1,500 YBP), reports that the interactions between Orchid island and the Batanes archipelago islanders were frequent until~300 YBP [6], but the interactions between TwA and Orchid Islanders have ceased much earlier.
The archaeological excavation from Batanes in 2002 [7] showed that the Batanes archipelago had been inhabited 4000 YBP. Similar to Orchid Island findings [8], the sites in Batanes indicated connections with middle to late Neolithic cultures originated in the eastern coast of Taiwan. More recently, two very specific forms of ear pendants that were made of green nephrite from eastern Taiwan were discovered in Orchid Island and Batanes along with other artifacts dating back to 2,500 to 1500 YBP [8]. Similar artifacts of same period have been reported in Orchid Island, Batanes, the Philippines, East Malaysia, southern Vietnam, and Thailand [8]. All these findings clearly suggest prehistoric trading activities around the China seas. Carbon dating from food debris suggests that the colonization of Batanes might have happened much later (~2,500 YBP), however the dating obtained from pottery residues or inferred from Northern Luzon findings suggests 4,000 YBP [9]. These date estimates have raised questions about the relationship between the present inhabitants of Orchid Island and Batanes and the simple "stepping stone" (or "Out of Taiwan") hypothesis [10].
From the end of the 19 th century to the middle of the 20 th century, Japanese anthropologists have conducted important ethnological studies on all Taiwan Aborigines tribes, including the Yami of Orchid Island [11][12][13][14][15][16][17][18][19][20]. Inter-village cultural variations among the Yamis were first noticed by Kano [21]. However, more recent anthropological studies suggest that sharing of common attributes among villages had been overestimated and accordingly, much more variation among villages should be expected [4]. A 1962 monograph about the social structure of the Yami [22] described the paternal genealogies of a number of the Yami families, some of which would be traced back to ten generations. Wei and Liu showed that generations of same family remained in the same village. For the first stage of our study, we used Y chromosome polymorphism to determine the patrilineal relationships between Yami families and then compared the genetic analysis with the genealogical information from Wei and Liu.
In the early 18 th century, following a destructive typhoon and ensuing famine, 35% of the Ivatan population perished [23]. The Catholic Church arranged one fourth of the Ivatans to move south to a more sheltered island near Luzon in the Philippines. At the end of the 19 th century, however, many of these peoples moved back to Batanes. As a consequence, one would expect the genetic profile of the extant population of Batanes to show some similarity with the northern Luzon people in the Philippines. In 2001, a human leukocyte antigen (HLA) study showed that the HLA-A and DRB1 allele distributions of the Ivatan were similar to the Yami and to the Puyuma tribe from the southeast coast of Taiwan [24]. For the second part of this study, we use mitochondrial DNA of relevant coding regions and the control region HVS-1, together with complete mtDNA genome sequencing of the most representative haplogroups among Yami and Ivatan, to further analyze the matrilineal relationship between Yami, Ivatan, Taiwan Aborigines, the Philippine people, and other populations from Mainland and Island Southeast Asia (MSEA and ISEA).
In summary, the study proposes to test the issue of genetic stratification described by Wei   to examine the issue of the initial settlement of the Orchid Islands to determine whether it happened with the mid Neolithic Austronesian expansion, and whether there was gene flow between the Batanes and the Orchid Islands. We will also test the issue of genetic affinities between the Batanes and the Philippines as expected given the relocation of Batanes individuals during the XVIII century. Finally we propose to further analyze the matrilineal relationship between Yami, Ivatan, Taiwan Aborigines, the Philippine people, and other populations from mainland and island Southeast Asia (MSEA and ISEA).

Mitochondrial DNA
The complete mtDNA sequence data of HVS-1 (nps 16,037 to 16,365), nps 8,000 to 9,000, nps 9,800 to 10,900 and nps 14,000 to 15,000 of 129 Yami and Ivatan individuals together with their detailed haplogroup classification are reported in the Additional file 1. The Yamis as determined by ten different mtDNA haplotypes showed considerably less polymorphisms than the Ivatans (20 haplotypes) or other Taiwan Aboriginal tribes (13 to 22 haplotypes) [25].
Although all Yami mtDNA haplogroups were seen among TwA, some were found partially in the Philippines (Table 1). Therefore, the F st tree (additional file 2; mtDNA) posits the Yamis to be intermediate between the Ivatans and all the other Taiwanese groups (including the Amis).
Complete mtDNA sequences from all phylogenetically relevant haplogroups of Yamis and Ivatans are shown in Figure 2, which include three haplogroups locally named in accordance with the van Oven "Phylotree" as F1a1d, M7b4 and N9a10 [26]. Haplogroup F1a1d [27] differs from F1a1a which was previously described by Hill et al. at np 16108 [28,29]. Nps 16399 and 11380 (Figure 2), are found in Tsou and Rukai in Taiwan (10.00% and 5.88% respectively), in Vietnam (~6%) [30,31], Fujian (<1%), and among the Yamis where drift is likely to explain the high frequency (22%) because haplotype diversity is low on the island (Table 1 and Additional file 1). The presence of these haplotypes in Yamis and near absence in the Philippines, suggests that the gene flow from Southeast Asia ended in Taiwan [30][31][32] and could have reached Orchid island as a result of the jade trade [8].
The clade, here named B4a1a4, was not seen in Taiwan. One twig of B4a1a4 did not show np 16360A and was seen in two Ivatan individuals (4%). Further screening for the presence of np 4025 in 132 B4a1a samples was undertaken to determine the presence of B4a1a4 in other regions of Taiwan, Southeast Asia and ISEA, and if possible, to infer its origin. Five B4a1a4 lineages lacking 16360A transversion were seen among the Filipinos (1%) and three of them were different at HVS-1. The higher B4a1a4 diversity south of Orchid Island favored a Philippine origin. As indicated by the low mtDNA diversity among Yami, genetic drift must have been active on the island and most likely accounts for the high frequency of the unique B4a1a4 lineage (24%) ( Table 1).
The complete mtDNA sequences and HVS-1 sequences in ISEA and Taiwan ( [27] and our unpublished data) were used to estimate and compare the ages of the haplogroups found in Yami and Ivatan (Table 2). While such dates may have considerable uncertainty [33], two patterns were seen: 1) Firstly, haplogroups shared between Yami and Ivatan (B4a1a (including B4a1a4), B4a2a and B4c1b2) showed age between~800 to 1,600 YBP (95% CI; 0 to 4,600 years) as estimated by HVS-1 polymorphism. Compared to the archaeological estimates of settlement [5,7], our observation suggested that a permanent settlement must have post-dated the first traces of human activities observed on Orchid or Batanes islands (A caution is noted because there is an estimate overlap between the 95% confidence interval (CI) and the archeological estimate).
2) Except for the Yami haplogroups M7c3c and E2b1 which had only one representative in Ivatan, no other non-B4 haplogroups were shared between Yami and Ivatan. The two groups of islanders were clearly differentiated by two patterns, haplogroups F1a1d and M7b3 in Yami and haplogroups E1a1a, E2a, E2b2, F1a3, F1a4, M7b4, and N9a10 in Ivatan. While F1a1d and M7b4 have been reported in MSEA [34] (Figure 2, Table 1), all other haplogroups have only been seen in ISEA or among TwA. This suggested that the only maternal influence (via Taiwan) from MSEA was limited to F1a1d and M7b4, and that most Yami or Ivatan could trace their ancestry to either ISEA (i.e. B4a1a4, E2a, E2b2, F1a3 and F1a4) or to Taiwan (i.e. B4a2a, E2b1 and M7b3a). Further, the largest molecular variation among these haplogroups within the Yami, gave a 95% confidence interval on an age estimate that is within 3,000 YBP (Table 2). This again supports a more recent stage of permanent settlement on Orchid Island compared to the archaeological estimate of 4,000 YBP.
In summary, while Yami (with all haplogroups except B4a1a4 and B4c1b2) showed a stronger relationship with Taiwan, the Ivatan showed a closer affinity to the Philippines or Taiwan than to Yami. If not considering genetic drift, this pattern indicates bypassing of the Batanes Islands in the early stage of "Out of Taiwan", and later colonization of the Batanes from Luzon. The evidence of bidirectional maternal gene flow between the two islands was inferred from a time of settlement not exceeding 3,000 YBP.

Y chromosome
The frequencies of Y-chromosome single nucleotide polymorphisms (Y-SNP) haplogroups are shown in table 3. As previously reported for Taiwan and ISEA [35][36][37], O1-M119 and O2-P31 were the most common haplogroups among Yami, but O2-P31 was not seen in Ivatan and not so common in the Philippines. Interestingly, macro haplogroups K and NO*, indicators of Paleolithic traces for ISEA (9% to 46%) and the Philippines (0% to 6%), were not seen in the Orchid or Batanes Islanders [38]. Also, except for the presence of one O3a4*-GPS002611 lineage in Yami and one O1a1*-P203x lineages in Ivatan (Table 3), Y-SNP sharing between Yami and Ivatan was restricted to haplogroup O1a*-M119x. The median joining (MJ) networks were constructed using Yami and Ivatan polymorphisms obtained from 16 Y-STR loci in each Y-SNP haplogroup (O1a*-M119, O1a1*-P203, O1a2-M110, O2a*-M95, O2a1a-PK4, O3a3*-P201 and O3a4*-GSP002611) (Additional file 3). Only five distinct O1a*-M119 Yami Y-STR haplotypes were found (Additional file 1). These haplotypes were not shared between the two islands, suggesting drift, sampling bias or an absence of recent paternal gene flow between Yami and Ivatan. No clusters of Yami or Ivatan Y-STR lineages were found with TwA, and Indonesia (Additional file 3). Nonetheless the haplogroup O1a1*-P203 network showed some relationships among Philippine Y-STR lineages and five Yami individuals from Iraralai (including three from family 48, one individual from families 44 and one from family 45) suggesting the peoples of Orchid island and the Philippines are related.
On the other hand, age estimates according to molecular variation [39] in Y-STR clusters suggested possible   # Molecular dating of mtDNA sequences for coding region synonymous mutations was obtained using a rate of one synonymous mutations per 7,884 years [53] or per 6,764 years [51]. $ Molecular dating of mtDNA HVS-1 sequences (n > = 5) was obtained using a rate of one transition per 19,171 years when Rho method was applied according to [53], or per 20,180 years according to [52,64]. NA: Not applicable. Note: Inferences made from such dates warrant caution as they have considerable uncertainty and may be inaccurate [33]. & All lineages were identical. local founding events not exceeding 3,230 YBP (± 1,400 years) for Yamis (Additional file 4) and 3,300 YBP (± 1,430 years) for Ivatans (data not shown).
A population phylogenetic tree was constructed using Y-STR F st distances between all the groups in our dataset and the other populations in SEA [40][41][42][43][44]. Yamis, Ivatans, Amis and Filipinos shared a close paternal relationship; this result agreed with the phylogenetic pattern from mtDNA studies (Additional file 2). Nonetheless, these ethnic groups also showed Y-STR affinity to the Southern Taiwan Aboriginal tribes (Paiwan, Rukai and Puyuma) probably indicating a greater inter-island movement of men than women. We also noticed that the few shared haplotypes between Yami and MSEA belong to the haplogroups O1a*-M119, O2a*-M95 and O2a1a-PK4 (Malaysia, Thailand, Southwest China, and Malagasy). Similarly, some haplotypes shared by Ivatan and Malaysia belong to haplogroups O1a2-M110 and O3a3*-P201.

Analysis of molecular variance (AMOVA)
Using the information shown in Additional file 1, the paternal and maternal lineages among Yamis were regrouped according to village of paternal and maternal origins. Analysis of molecular variance (AMOVA) [45] between maternal lineages and their village of origin (Table 4) did not show much differences among villages (F st = 0.0055; P > 0.05) indicating that mtDNA lineages were distributed randomly throughout Orchid Island among women. On the contrary, the Y-STR paternal variation among villages varied significantly (F st = 0.17835; P < 0.0001) which suggest a sedentary life of the Yami men.

Phylogenetic and Genealogy
In Figure 3 (and Additional file 4), the ancestral and extended families in each village [22] were compared with the Yami NRY most parsimonious tree constructed from our Y-SNP and Y-STR results. Each Yami individual in the figures represents one nuclear family. The relationship between villages, ancestral and extended families (Left lay out of Figure 3A, B and 3C) have been arranged to represent the Wei and Liu model, in which "the Yamis are a patrilocal society where families and their ancestry are village specific" [22]. Accordingly, the correlation among extended families should extend to the correlation among most parsimonious tree and the lay outs (Figure 3). Deviations from this relationship would create crossings among the correlations lines. Quantitative visualization of the Wei and Liu relationship was constructed with the GenGIS software [46]. Fitting of the ordered lay outs to the corresponding phylogeny was tested using a Monte Carlo permutation test of the leaf nodes. The P values indicated that the fraction of crossings were lesser than what was set in the figure out of 1000 permutations [46] (Additional file 4). All P values ( Figure 3A, B and 3C) were < 0.01 suggesting that the model used to represent the Wei and Liu hypothesis produced a significant number of correlation lines.

Genetic relationship between Yamis and Ivatans
Substantial trading among the regions of MSEA, Taiwan and ISEA dated back to~4,000 YBP was described in the literature indicating that all the islanders, including Yami, Ivatan and coastal dwellers from the China Sea, used advanced navigation techniques to sail forth and back among islands. Such findings were inferred by: 1. Artifacts found in Orchid Islands and Batanes that were dated back to the "Fine Corded Ware Culture" of Taiwan around~4,000 YBP [5,7]; 2. Jade trading among the Philippines, East Malaysia, southern Vietnam, Orchid Island, Batanes, and Thailand, that occurred between 2600 to 1500 YBP [8]; 3. The presence of Y haplogroups O1a2 and O2a in Madagascar suggesting an establishment associated with the Austronesian expansion or people coming from Southeast Asians during 1,500 to 2,000 YBP [43,47]; 4. Yami and Ivatan linguistically connected to the Western Malayo-Polynesian branch of Austronesian in ISEA [2].
In this study the matrilineal and patrilineal relationship between Yami, Ivatan, Taiwan Aborigines, the Philippine people, and other populations from the mainland and island Southeast Asia, were analyzed. Our goals were first test if there was a northward gene flow from the Philippines to Taiwan, and second to compare the Y chromosome data for the Yamis with paternal genealogy report by Wei and Liu (1962).  20% of the mtDNA haplogroups shared between Yami and Ivatan included B4a1a4, B4a2a, B4c1b2, E2b1, and M7c3c. The sharing of Y-SNP was higher (40.8%) and included haplogroup O1a*-M119, O1a1*-P203 and O3a4*-GSP002611. Lin et al. (manuscript in preparation) observed sharing between Taiwanese Han and TwA (23% for mtDNA haplogroups and 42% for Y-SNP). This increased Y-SNP contribution could reflect a sex biased social behavior. Alternatively, it could be associated with the slower mutation rate of the Y-SNP polymorphism that results in lower haplogroup diversity. However, using mtDNA (HVS-1 and relevant coding region information), Y-STR polymorphism and the Y-SNP diversity, no such disproportion of haplotype sharing was seen between Yami and Ivatan (mtDNA: 8%; Y-STR: 7%). The mtDNA haplogroup B4a1a4 defined by np 4025 (Yami 15%, Ivatan 4% and Philippines 1%) was the only representative one of the B4a1a clade in Yami. Its complete absence in Taiwan Aborigines and higher diversity in Filipinos suggests a northward gene flow from the Philippines within 3,000 years ( Table 2). The two distinct branches of B4a1a4 seen in Yami and Ivatans (Figure 2) indicated that the islanders must have remained in isolation since settlement. Further, the total number of mtDNA haplogroups (Table 1) observed in Yami and Ivatan (7 and 15 respectively) were relatively small in comparison to that in Taiwanese Han and the Filipinos (77 and 43) indicating isolation and a small number of initial founders on the islands. This indication of isolation of the Yamis becomes plausible as only ten mtDNA haplotypes with frequencies ranging from 6 to 24% were sufficient to represent all the seven Yami mtDNA haplogroups. Alternatively, poor sampling, small population size on the small island, and genetic drift may all have influenced the genetic profiles observed [6].
All the Yami and Ivatan Y-SNP haplogroups belonged to the subgroups of macro haplogroup O which is seen throughout the MSEA and ISEA. The frequency of haplogroup O3 is high in Northern and Central Asia, whereas that of haplogroup O2 in south Asia and MSEA, and that of haplogroup O1 are being mostly distributed throughout ISEA [37,48,49]. The Y-SNP haplogroups seen in Yamis or Ivatans (subgroups of O1a, O2a, and O3a) also appear in MSEA and together represented a possible minimal haplogroup sharing of 26% between MSEA and either Yami or Ivatan. Nonetheless, a distinct contribution from MSEA to the islands was difficult to ascertain based of Y-SNP polymorphism alone. A matrilineal influence from MSEA was also indicated by the presence of the mtDNA haplogroups B4c1b2, F1a1d or M7b4 which determines a matrilineal contribution of 6% of the Yamis and of 14% with Ivatans. Many other mtDNA haplogroups seen in Taiwan and ISEA/Philippines suggest a direct gene flow from these locations to Batanes and/or to Orchid Island. For example by comparing haplogroup frequency and gene diversity, haplogroups B4a2a, and E2b1 (and to a lesser extent F1a1d and N9a10) suggested a gene flow from Taiwan, and haplogroup B4a1a4, B5b1, E2a and E2b2 suggested a gene flow from the Philippines. In general, Yami and Ivatan had stronger affinity with their closest larger neighbor.
Our mtDNA phylogenetic tree (Additional file 2) puts Yami and Amis in the same cluster as Ivatans and the Philippines. Except for the Amis, this clustering followed the same pattern as described by Ross (2005) indicating separate sub-branches of Batanic languages for Yamis and Ivatans both of which belong to the Western Malayo-Polynesian branch of the Austronesian language family, dated back to 2,500 YBP [2,50]. Also, age estimates from molecular variation of mtDNA haplogroup B4a1a4 and of Y chromosome O1a*-M119 in Yami and Ivatan indicated and overlap in the dating ranges (95% CI for mtDNA ranging from 0 to 3,000 YBP, and SE for Y-STR ranging from 750 to 3,230 YBP) ( Table 2 and Additional file 4). The strong genetic affinity between Yamis and Taiwan Aborigines and the lack of genetic flow between Yamis and Ivatans (Additional file 3) led us to hypothesize that a language shift from Formosan to Malayo-Polynesian may have occurred among Yami. The language shift might not be associated with the gene flow from the Philippines but might have resulted from linguistic diffusion that was initiated by trading of jade or other goods in the region [8].

The formation of Yami and Ivatan -time and people
Molecular Dating with the Rho Statistic [51][52][53] of mtDNA clades (Table 2) and/or Y-STR clusters (Additional file 4) of Yamis and Ivatans rarely exceeded 2,000 years (SE 750 to 3,230) which differs from the archeological estimate of 4,000 years [5,7]. Thus the extant populations on these islands most likely represent a more recent family line of immigrants.
Interestingly, none of the mtDNA and Y chromosome haplogroups seen in Yami or Ivatan suggested a relationship with the eastern Melanesian populations where mtDNA haplogroups P and Q, and Y-SNP haplogroups D, C, F and K are prominent [27,35,54]. A few mtDNA haplogroups among Yamis or Ivatans originated either in Taiwan (B4a2a, E2b1, F1a1d, and N9a10) or the Philippines (B4a1a4, E2a, E2b2 and B5b1). All the remaining haplogroups were commonly seen in Taiwan Aborigines and Filipinos. The data suggested a bidirectional gene flow and support the "Viaduct model" proposed by [27].

Yami Paternal genealogy and Phylogenetic diversity
While the Y-SNPs haplogroups were heterogeneously distributed throughout Orchid Island ( Figure 3B and Additional file 4), only one Y-STR lineage (represented by YF02 and YE14) was seen in two different villages (Additional file 4). Nonetheless, an AMOVA test using Y-STR lineages distribution among villages (Table 4) confirmed the patrilineal heterogeneity (P < 0.0001) throughout Orchid Island. On the contrary, the AMOVA test conducted by mtDNA lineages did not show significant matrilineal genetic variation within or among villages, indicating that the maternal genetic ancestry was homogeneously distributed throughout the island, and that male gene flow rarely occurred. This observation was supported by the anthropological study of Wei and Liu [22] and of Yu-mei Chen (private communication) who observed that intermarriage between villages were common for women from Iranumilk, Imourud and Ivarinu villages. Further the mtDNA analysis using an exact pairwise population differentiation test [55] did not show significant differences among the three villages (Iranumilk, Imourud and Ivarinu) and other villages on Orchid Island (data not shown).
We also investigated the Yami oral history which claims that people from Iraralai, Yayu, and Ivarinu had close relationships with the Ivatans (Yu-mei Chen private communication). In Additional files 3 and 4, two O1a*-M119 nuclear Yami families showed clustering with Ivatans (YD05 and YD12 from the extended families 44 and 47). No such relationships were found with Taiwan Aborigines or other region of MSEA. Another strong relationship was seen in the O1a1*-P203 network (Additional file 3) between YD13 (from family 46) and one Y-STR lineage carried by two Filipinos. Interestingly, our genetic data supported the oral history reported by (Yu-mei Chen private communication). We also investigated two other folk tales of Yami, one related to children adoption and the other related to people seeking refuge in another village. If child adoption indeed took place, this can be inferred from the correlation profile of the Iraralai families 44, 45 and 47 each having some family members in different genetic subclades (Additional file 4). Our NRY data were unable to support if the people from Imourud had migrated to Iraralai after a major flood in the island [6].

Conclusions
A close genetic relationship between Yamis and Ivatans was hypothesized by linguistic studies, since both groups of islanders belong to the Batanic sub-branches of the Malayo Polynesian language group found in the ISEA. Accordingly, such a relationship would indicate a northward migration from the Philippines via Batanes archipelago and Orchid Island toward Taiwan. Our study, using Y-SNP and mtDNA polymorphism at the macro haplogroup level, showed that a strong affinity between the Yamis and Ivatans was resulted from gene flow between Taiwan and Philippines. Each island population showed a higher affinity with the closest main island (i.e., Yami with Taiwan, or Ivatan with Philippines) than with each other. This suggests an early isolation of the population and little intermarriage among the islands. Only few traces of gene flow were found between Yami and Ivatan or between Yami and Philippines. The gene flow appear independent from the cultural development, suggesting that trading had small impacts on genetic exchanges but must have resulted the linguistic affinity observed today among Yami, Ivatan and Philippines.
The age estimates of the mtDNA or Y-STRs variations suggested settlements on the islands dated back tõ 3,000 YBP. However, the archeological artifacts found on Orchid Island and Batanes were associated with the "Out of Taiwan" hypothesis, indicating a southward migration from Taiwan and an earlier settlement on the islands that might be 4,000 YBP. These conflicting observations suggested that our sampling may have been too small to reveal sufficient or significant markers that can support a unique southward gene flow.
In Additional file 5 we propose three separate scenarios [2]. Briefly, scenarios 1 and 2 were proposed by Ross [2]. They correspond to the "Out of Taiwan" hypothesis (scenario 1, Additional file 5) and to a northward migration from Luzon to the Batanes archipelago (scenario 2, Additional file 5).
Ideally, any scenarios should consider variation due to drift, founder effect and admixture. Although the Out of Taiwan model [10] allows for some micro-spatial interactions, these conditions are ignored in a linguistic based model. The simple stepping stones of Neolithic dispersal represented by scenario 1 (Additional file 5) is not sufficient to associate with the complexity of genetic patterns observed in this study. We described that very little Y-STR sharing between Yami and Ivatan was seen (Additional file 3). Their mtDNA patterns/profiles was also very distinct. In general, the mtDNA haplogroups with high frequencies in one population was very low in the other population, but the mtDNA haplogroups were frequently matched among closest populations. Such variation could also be expected from a strong genetic drift (as indicated by the Tajima's D value (Table 1)). Scenario 3 (Additional file 5) seems to fit well with the mtDNA and Y-SNP data. It also evokes a much reticulated network of cultural relationships, and suggests (as for scenario 2) the possibility of northward Malayo-Polynesian language diffusion from Luzon (or from the Batanes Archipelago). While these hypotheses require further simulation testing, we propose that the extant genetic relationship observed between Yamis and Ivatans was resulted from complex events that occurred during the period of the Out of Taiwan and the subsequent trading between Taiwan and Luzon. Linguistic diffusion from Philippines may have also affected these events.
Finally, our diversity analysis of NRY Polymorphism diversity showed major concordance with the Wei and Liu paternal genealogies. Such ethnographic study of kinship provided insights to the complex and uncertain ways of how ideas of family ancestry, culture and linguistic contributed toward the formation of the Yami group identities, and how genetic revealed or confirmed their descent and their origins. Although the paternal relationships among the Yami groups determined by the survey of Wei and Liu covered only a few generations, it contributed clearly toward the groups self perception of their identity. However, these notions of relatedness were complicated by the accumulation of too much information, such as the complex and deeply rooted one brought upon by genetics. We showed how knowledge of ancestry, when combined with history, social relationships, genealogy and the use of several genetic systems, can be put to work to determine the idea of tribally pure lines of descent within families.
Despite the complex and ambivalent ways in which people perceive the cultural, biological and genetic constitution of ethnic identities, rapid social changes, frequent risk of ethnic group dilutions or their disappearance, make it an urgent requisite to obtain additional data from all minority groups, such as the Yami and Ivatan, to record more accurate extant profiles, and finally to favor multidisciplinary approaches.

Methods
Seventy-nine unrelated Yami from Orchid Island (30 men and 49 women) were asked to participate in the study. All individuals provided their name, birthplace, the name of their parents and the village their parents came from. Among the 79 individuals, 12 mothers were from Imourud, 33 from Iraralai, 11 from Yayu, ten from Iratai, eight from Iranmilk, and five from Ivarinu ( Figure 1). Among the 30 men, five were born in Imourud, 15 in Iraralai, eight in Yayu, one in Iranmilk, and one in Ivarinu (Additional file 1). Using subject's name, parents' names, and birthplace information, each Yami male individual was traced back to one of the extended families described in the Wei and Liu 's genealogy [22]. Since Wei and Liu's genealogy was based on patrilineality, only the Y chromosome phylogeny (Y-SNP and Y-STR) was used for comparison between the genealogy and genetics.
To analyze the relationship between the Yami and Ivatan, 50 unrelated Ivatan individuals (24 men and 26 women) were recruited from Itbayat, an island of the Batanes archipelago belonging to the Philippines (Figure 1).
All participants in this study gave informed consent to the study for collection of blood samples and DNA analysis. The project was approved by the ethics committee of Mackay Memorial Hospital, the Taiwan Health Department and the Philippines government.
To analyze the polymorphism of mtDNA and Y chromosome, DNA was extracted from 500 μl of buffy coat from each blood sample using the QIAmp DNA kit (QIAmp ® DNA Blood Mini kit from Qiagen inc. Taiwan). The non-recombining region of the Y chromosome (NRY) was determined using 70 single nucleotide polymorphisms (SNP) and 16 short tandem repeats (STRs). For mtDNA typing, control region HVS-1 [56], nucleotide positions (nps) of coding region fragments 8000 to 9000, 9800 to 10900 and 14000 to 15000 were sequenced using the method described in our previous publications [25,27]. When relevant to the study, complete mtDNA genome sequencing was carried out [25]. Briefly, 24 fragments of mtDNA were amplified and sequenced in both directions [25,57]. Haplogroup assignments were done according to the "Phylotree" criterion [26] available at http://www.phylotree.org using the combination of the HVS-1 sequence, partial sequencing of the coding region, and other relevant diagnostic variants of the coding region obtained by restriction fragment length polymorphism (RFLP) [25,27]. In addition, the presence of np 4025 indicating locally named mtDNA haplogroup B4a1a4 (Figure 2) was determined by sequence specific polymorphism (SSP) using forward primer 3999-4025 (5'TATTA TAATA AACAC CCTCA CCACT AT3'), and reverse primer 4049-4025 (5'TCATA TGTTG TTCCT ACCAA GATTG3') as internal primers of fragment 6 described by Rieder [25,57].
Y chromosome polymorphisms were ascertained using a hierarchical stepwise approach. For this, relevant SNPs were determined using direct sequencing of amplicons obtained from specific primer pairs as described in the Y Chromosome Consortium 2002 [58][59][60]. In brief, DNA samples were initially tested for super haplogroup O markers. Since all Yami and Ivatan samples were found to belong to this haplogroup, specific down stream markers of haplogroup O were then determined using more restricted primers [58]; [37]. Y STRs were subsequently determined in all individuals using 16 STRs (AmpFlSTR ® Y filer ® PCR Amplification Kit from Applied Biosystems, Taiwan).

Data analysis
Frequencies of haplogroups among populations were obtained by direct counting (Table 1 and 3). On the basis of haplogroups frequency, mtDNA and Y-STRs distances matrices were obtained using F st distances after 10,000 permutations and a 0.05 significance level (ARLEQUIN package 3.1) [55]. Population phylogenetic trees were constructed using the neighbor-joining (NJ) method of (Saitou and Nei 1987) implemented in the Phylip package [61]. Test of neutrality, Tajima's D value (1989) [62], was calculated with DnaSP Sequence Polymorphism software package [63].
Specific F st indices to measure the variance of paternal or maternal lineages within and between villages was obtained from AMOVA using ARLEQUIN package 3.1 [55]. Ages of molecular variation for mtDNA were inferred using the ρ method for complete sequencing and HVS-1 data [51][52][53], using a rate of one synonymous transition per 7,884 years (bps 590-15990) and one transition per 19,171 years (bps 16090-16365) for the Soares method, or 6764 years and 20180 for Kivisild and Foster and Saillard methods respectively [51][52][53]64]. Y chromosome dates were estimated using Y-STR data in the background of their respective SNP haplogroups using the ρ statistic with an average mutation rate of 6.9 × 10 -4 ± 5.7 × 10 -4 per locus per 25 years [39]. Generation length, bottlenecks, founder events and population size dynamics, geography are confounding factors that may cause unexpected variations of rho and warrant caution to inferences made from molecular variation [33].
Y-STR median joining networks were constructed using Network software 4.5.1.0 [65]. Finally, a Yami NRY phylogenetic tree was constructed using Y-SNP and Y-STR patterns in the background of each Y-SNP haplogroups, O1a*-M119, O1a1*-P203, O2a*-M95, O2a1a-PK4 and O3a4*-GSP002611 respectively (Additional file 3) [66,67]. Correlation between the village restricted paternal genealogy of Wei and Liu [22] and the leaves of the NRY Phylogeny was analyzed and visualized with the GenGIS package [68]. Accordingly, extended families, villages and ancestral families ( Figure  3A, B and 3C respectively) were first separately laid out to obtain the minimum number of correlation lines crossings between the genealogic lay out and the leaves of the NRY phylogeny. A Monte Carlo permutation test was performed on the leaves of the Phylogenetic tree to assess if the fit was significantly better than random.