DYZ1 arrays show sequence variation between the monozygotic males

Background Monozygotic twins (MZT) are an important resource for genetical studies in the context of normal and diseased genomes. In the present study we used DYZ1, a satellite fraction present in the form of tandem arrays on the long arm of the human Y chromosome, as a tool to uncover sequence variations between the monozygotic males. Results We detected copy number variation, frequent insertions and deletions within the sequences of DYZ1 arrays amongst all the three sets of twins used in the present study. MZT1b showed loss of 35 bp compared to that in 1a, whereas 2a showed loss of 31 bp compared to that in 2b. Similarly, 3b showed 10 bp insertion compared to that in 3a. MZT1a germline DNA showed loss of 5 bp and 1b blood DNA showed loss of 26 bp compared to that of 1a blood and 1b germline DNA, respectively. Of the 69 restriction sites detected in DYZ1 arrays, MboII, BsrI, TspEI and TaqI enzymes showed frequent loss and or gain amongst all the 3 pairs studied. MZT1 pair showed loss/gain of VspI, BsrDI, AgsI, PleI, TspDTI, TspEI, TfiI and TaqI restriction sites in both blood and germline DNA. All the three sets of MZT showed differences in the number of DYZ1 copies. FISH signals reflected somatic mosaicism of the DYZ1 copies across the cells. Conclusions DYZ1 showed both sequence and copy number variation between the MZT males. Sequence variation was also noticed between germline and blood DNA samples of the same individual as we observed at least in one set of sample. The result suggests that DYZ1 faithfully records all the genetical changes occurring after the twining which may be ascribed to the environmental factors.


Background
The diverse role of nature and nurture has been addressed on the basis of studies on twins which are like natural clones. It is believed that the differences between twins are largely due to the influence of environmental factors. Theoretically, identical twins must be identical because they arise from a single fertilized egg (zygote). However, recent studies have shown that the identical twins are not truly identical as they show discernible variation in their genotypes [1][2][3]. The genetic differences between MZ twins represent an example of somatic mosaicism [4,5]. On the same token; one may expect similar mosaicism in the germline samples also. However, this has not been demonstrated unequivocally. During the early stages of life; it is difficult to uncover differences in any of the biological attributes of twins.
However, as twins age, genetic and epigenetic changes accumulate which cause the differential expression of genes in twins [6,7]. Twins have been reported to show copy number variation for a number of genes [1]. Therefore, the term monozygotic twins (MZT) is more appropriate rather than the use of identical twins as there are no identical twins in true sense.
The reason of monozygotic twinning in human is not clear. MZ twins result when a fertilized egg or zygote splits into two embryos. This remarkable event takes place during the first week after fertilization and can happen at different times such as at the two cell stage on days 1 to 3, at the early blastocyst stage on days 4 to 6 or in the late blastocyst stage on days 7 to 9 [8]. The frequency of monozygotic twinning increases 2 to 5 times with in vitro fertilization [9,10]. In case of female monozygotic twinning, one suggested mechanism is the preferential inactivation of the normal X in one of the twins [11,12]. Twinning occurs spontaneously at the rate of about 1 in 80 live births [8,13]. However, monozygotic twinning spontaneously occurs at the rate of about 1 in 250 live births [8,14]. The rate of spontaneous twinning is highest (1 in 11) in Nigeria and lowest (1 in 250) in Japan. The occurrence is about 6 per 1000 in Asia, 10-20 per 1000 in Europe and USA and about 40 per 1000 in Africa [8,15].
Mammalian Y chromosome originated from an ancestral autosome about 300 million years ago is a degenerated X-chromosome [16]. The human Y chromosome is male specific, constitutively haploid and largely escapes meiotic recombination. Lack of recombination was thought to be responsible for the degeneration of the human Y chromosome and loss of Y linked genes, but a recent study showed that during the past 25 million years, the human Y chromosome lost only 1 gene [17]. Thus, crucial genes seem to have been retained by the Y chromosome.
Approximately, 95% (60 Mb) of the human Y chromosome represents a male specific region of the Y (MSY). Similarly, 5% (3 Mb) of the human Y chromosome comprises of pseudo-autosomal region (PAR) necessary for the pairing with the human X chromosome. The human Y chromosome has a high proportion of repeat elements. The satellite sequence DYZ1 constitutes approximately 20% of the total Y chromosome [18]. Based on the HaeIII digestion of the human genomic DNA, DYZ1 was identified as a 3.4 Kb band in the males [19], which was found to largely contain a pentameric repeat ''TTCCA" [20]. A normal human Y chromosome contains approximately 3000-4300 copies of the DYZ1 arrays [21]. Since DYZ1 copies do not participate in recombination, it was deduced to have no functional or evolutionary advantage [16]. However, even the most repetitive stretches of DNA have significance in the genome as the same are envisaged to absorb undue mutational load. DYZ1 is now reported to play a crucial role in chromatin folding and maintenance of the structural integrity of the Y chromosome, thus having some functional attributes [21].
The major part of human genome is heterochromatic and environmentally triggered genomic changes are generally absorbed by this region. It is largely expected that, no major change takes place in the arrays of DYZ1 because it does not undergo recombination. However, Since DYZ1 represents heterochromatic region of the human Y chromosome, any change taking place between the two males of MZT after twining may in principle be detected. Mutations occurred during pre-twinning stage will be present in both the twins while, the ones acquired during the later stages in life will differentiate them from each other. With this premise, we undertook analysis of DYZ1 between the males of three sets of MZT. In one set, we analysed DYZ1 arrays in the DNA from the semen sample as well. In the present study blood DNA samples from three pairs of MZ twins were used. We also collected germline DNA samples from MZT1. We sequenced and virtually restriction mapped the 3564 bp unit of DYZ1 arrays. We also calculated the copy number of DYZ1 amongst the sets of these twins using Real Time PCR. The number of "TTCCA" repeats and its single, double, triple, four and five base pair derivatives per 3564 bp unit generate a profile for the respective arrays. Difference in the number of TTCCA repeats and its derivatives, copy number variation of DYZ1 arrays and loss/gain of the restriction enzyme sites were compared to uncover differences between MZT males. Similarly, comparison was also made between the blood DNA and germline DNA of the MZT1. Our result shows that DYZ1 indeed is capable of faithfully recording the sequence variation following the process of twining. These changes may or may not be exclusively due to environment, such correlation is possible to establish. This information is envisaged to be useful in the context of biology, medicine and forensic cases.

bp unit of DYZ1 array
Four PCR amplified fragments (Figure 1, purple) were cloned and several positive clones were sequenced. We have taken the consensus of these sequences following the alignment as representative of the majority of the DYZ1 arrays. Further, we sequenced PCR amplified products directly and repeated the process twice and got variations each time. Keeping that in mind, we have relied on the sequences obtained from a cloned product over to that of amplified ones since cloned fragment ensures purity of the template having identical molecules. We have submitted sequences of all such cloned fragments to the GenBank and assigned accession numbers are given herein ( . The number of highly abundant "TTCCA" repeats and its single, double, triple, four and five base pair derivatives per 3564 bp unit (HaeIII fragment) of DYZ1 array were counted. The differences in the number of TTCCA repeat unit and its derivatives between MZT are highlighted in yellow, green and sky blue for MZT pairs 1, 2 and 3, respectively (Table 1). Germline samples of both the individuals of MZ twin pair 1 were used. We also sequenced the 3.56 kb HaeIII fragment of DYZ1 array originating from germline DNA of MZT1 pair, ascertained the number of TTCCA repeats and its derivatives per 3.56 kb unit of DYZ1 array and compared the differences between DNA of blood and germline origin with respect to DYZ1 array. The detailed result is shown in Table 2 and differences are highlighted in bold. DYZ1 array sequences with adjusted "TTCCA" reading frame are shown in Additional file 1.      Table 4. The real restriction mapping experiments did not always correlate with the virtual restriction mapping data because in case of virtual restriction mapping, we dealt with a single array sequence while in case of real restriction mapping, we dealt with a pool of DYZ1 array sequences and average of all of them may be lot more different than that of a single array sequence. Out of several restriction enzymes like RsaI, BstXI, DpnI, EcoRI, MboI, MboII, XmnI, TatI, MseI, ApoI, MfeI, BseMII, NlaIII and DdeI; DpnI restriction pattern showed variation between blood and germline DNA (Figure 2). Taken together, the blood genomic DNA does not contain DpnI site while germline DNA does in MZT1.

DYZ1 copy number variation
DYZ1 copy number was calculated using absolute quantitative PCR following SYBR green chemistry and a standard curve of cloned DYZ1 plasmid using ten-fold dilutions. The dissociation curve, standard curve and amplification plot are given in Figures 3A,B and C, respectively. The respective copy number values for all twin pairs and controls are shown in figures 3D. Twin pair sets 1, 2 and 3 showed differences of 409, 367 and 697 of DYZ1 copies, respectively.

Localization of DYZ1 on metaphases/nuclei using Fluorescence in situ Hybridization
We screened approximately 400 nuclei and metaphases. To rule out the possibility of experimental error, two positive controls (metaphases prepared from normal human blood) were used with the same probe preparation. Following FISH, the nuclei and metaphases showed DYZ1 signal of varying intensity which is due to the varying number of its copies. The representative FISH pictures are shown in Figure 4.

Discussion
The genome that we are born with is not the one that we die with [1]. This is true for all the cells in our body. So, as we age, environmentally triggered genomic changes accumulate in our DNA more in the repeat regions. Accordingly then, the difference between the identical twins increases as they age. Twins can also begin their lives with some major differences.
MSY region of the human Y chromosome does not take part in the crossing over, so the DNA comprising MSY is faithfully passed on from father to son. However, MSY may accumulate mutations during the life time of an individual. In case of DYZ1, point mutations generate  derivatives of "TTCCA" while insertions and deletions shift the "TTCCA" frame. Genome tries to neutralize or minimize these changes. In the process, insertion at one point may lead to the deletion at another point and vice versa (Additional file 2). Despite these changes in the number of "TTCCA" and its derivatives, the overall length of the array remains almost unchanged. Independent mutational events may also lead to gain or loss of restriction enzyme sites in DYZ1 array which is evident from the present study.
In addition to these, DYZ1 arrays showed copy number variation between MZT as uncovered by real time PCR. However, fluorescence signal intensity (Figure 4) of DYZ1 probe is not always correlated with its copy number variation. This is because every cell does not contain equal number of DYZ1 copies. Similarly, DNA used for quantitative Real Time PCR does not contain homogeneous population of DYZ1 sequences. Thus, DYZ1 copies calculated using absolute quantification is the average of the DYZ1 arrays present in the pool of DNA from all the cells. Analysis of DYZ1 has been pursued in our laboratory in the context with Sex Chromosome Related Anomalies (SCRA) [22], males exposed to Natural Background Radiation (NBR) [21], Arsenic Poisoning [23], Prostate Cancer cell lines [18] and Infertility [24]. Significantly, DYZ1 was found to show much reduced copies in all these cases. Thus, indeed there exists a correlation between the reduced copies of the DYZ1 and these abnormal conditions. DYZ1 does not unequivocally differentiate between monozygotic twins but the effect of nature vs. nurture on twins can be studied with respect to DYZ1 arrays. This is because at the timing of twining, the copies in both the males are expected to be identical. Any variation noticed either in the copies of the arrays or within is ascribable to the environmental conditions. Thus, present study has relevance in the context of changes brought about in the DYZ1 arrays between two males of monozygotic origin.
Taken together, the study mainly supports the argument that, the monozygotic twins are not really identical as evident from this study. Extrapolation of this study in a large number of samples may lead to the discovery of sufficient genetic variation in the DYZ1 arrays from across the samples. This in turn would augment the already existing approaches useful for the discrimination of identical twins in the context of forensic cases.

Conclusions
DYZ1 arrays have shown variations between the monozygotic males. Similarly, sequence variations were also established between germline and blood DNA samples of the same individual for one twin pair. This approach is envisaged to be of relevance in biology, medicine and forensic cases if sufficiently large number of samples both from blood and germline are analysed.

Sample collection and DNA isolation
Present study was approved by the Institutional Human Ethical Committee of the National Institute of Immunology, New Delhi. Peripheral blood lymphocytes (PBLs) were collected from three pairs of male monozygotic twins, with their informed consent. Genomic DNA from blood was isolated using DNeasy Blood and Tissue kit from Qiagen, Germany (Cat no. 29504) and Germline DNA of MZT1 was isolated following standard protocol [21]. Quality of isolated DNA was checked by electrophoresis using 1% agarose gel. DNA concentration was measured spectrophotometrically.

End point PCR
PCR primers used to amplify the full 3.56 bp HaeIII fragment of DYZ1 are listed in Table 5 and illustrated

Cloning and DNA sequencing
End point PCR amplified DYZ1 array fragments resolved on 1.0% agarose gel were extracted using a kit (Fermentas, Thermo Fischer Scientific). Purified fragments of DNA were cloned in blunt end cloning vectors (CloneJet, Fermentas, Thermo Fischer Scientific). Four recombinant clones, each representing positive ones were selected after conducting colony PCR using vector specific forward and reverse primers. Recombinant clones were further confirmed by restriction digestion. Four purified recombinant clones were sequenced on Applied Biosystems 3130xl genetic analyzer using ABI ABIPRISM® BigDye® terminator v3.1 cycle sequencing kits (Life technologies, California, USA). PCR conditions were set as 96°C for 1 minute, followed by 25 cycles each consisting of 96°C for 10 seconds, 50°C for 5 seconds, and 60°C for 4 minutes. After cycle sequencing, extension products were purified to remove any unincorporated dye-labelled terminators using ethanol-sodium acetate precipitation method followed by washing in 70% ethanol. Hi-Di™ Formamide (Life technologies, California, USA) was added, samples were heat denatured, chilled on ice and loaded onto the 3130xl genetic analyzer, ABI. The data was collected using 3130xl Data Collection  Software v3.0. Sequences were analyzed using Sequence Scanner software version 1.0 and gene runner software version 3.05.

Restriction mapping
The DYZ1 sequences were subjected to virtual restriction mapping using Restriction Mapper Software Version 3.0 (Tables 3 and 4). To support the virtual restriction mapping data, we conducted real restriction mapping on PCR amplified product using several restriction enzymes. The reaction digestions were carried out in 20 μl reaction mixture using 1 μg of template DNA and 2 units of enzyme following standard protocols (NEB, UK). Digested samples were resolved on 2.0% agarose gel and visualized under UV illumination to record the resultant bands.

Copy number estimation
Number of DYZ1 copies was calculated based on absolute quantification method using quantitative PCR (qPCR). DNA was used as template and SYBR green (Life Technologies, California, USA) as detection dye. The qPCR reactions were performed on Sequence Detection System 7500 (Life Technologies, California, USA) following 10 fold dilutions of recombinant plasmid con-taining~3.4 Kb HaeIII fragment of DYZ1 array starting with 2 × 10 8 copies and standard curve were generated. All the reactions were carried out in triplicates using three different concentrations of the template DNA. The standard curve has a slope of −3.32 and R 2 value of >0.99. Copies of the DYZ1 array were calculated by extrapolation of the standard curve obtained with known copies of the recombinant plasmid. To show the reproducibility of qPCR results, error bars are shown on top of the graph bars.
Florescence in-situ Hybridization (FISH) Peripheral blood cells cultured in PB-MAX™ Karyotyping Medium (Gibco®, USA) were used for metaphase chromosome preparation. The cells were grown for 70 hours in 5% CO 2 environment at 37°C and then treated with colcemid (3 μg/ml). Treated cells were again incubated for 2 hours in 5% CO 2 environment at 37°C. After 72 hours, cells were centrifuged at 1800 rpm for 10 minutes at room temperature (RT). Harvested cells were resuspended in 0.075 M KCl and incubated at RT for 20 minutes in 5% CO 2 environment at 37°C. Then added 1 drop of fixative solution (3:1, methanol: glacial acetic acid) and centrifuged at 1800 rpm for 10 minutes at RT. Discarded the supernatant, resuspended the cell pellet in 10 ml fresh fixative solution and incubated for 20 minutes at 37°C. Then centrifuged cells at 1800 rpm for 10 minutes at RT. Repeated the washing step 2 times. Finally, cells were resuspended in fresh 1 ml fixative and stored at −20°C until used.
20 μl of nuclei suspension in fixative was spread on the fixative dipped glass slides. Before proceeding further, slides were kept for 1 week at 37°C for ageing. Slides were then incubated in 70% glacial acetic acid for 2 minutes followed by dehydration in 70%, 90% and 100% ethanol for 2 minutes each at RT. Slides were air dried and incubated in a solution containing 0.1 mg/ml and 0.01 N HCl for 20 minutes. Fixed the metaphase preparation in 4% paraformaldehyde (prepared in 1X PBS, pH 7.4) for 5 minutes at RT. Slides were washed 2 times in PBS followed by once in water. Further, slides were dehydrated in 70%, 90% and 100% ethanol sequentially. Air dried slides were then used for hybridization. FISH was conducted with a labelled clone containing 3.56 kb sequence of DYZ1 array. Labelling was done with biotin-dUTP using a Nick translation kit from Vysis (Illinois, USA). Hybridization, washing, counterstaining and mounting of the slides were conducted following standard protocols [25]. The slides were screened under the Olympus fluorescence microscope (BX 51) fitted with vertical fluorescence illuminator U-LH100HG UV, excitation and barrier filters and images were captured with a charge-coupled device (CCD) camera. Captured images were analysed using CytoVision software version 3.93 from Applied Imaging Systems.