 Methodology article
 Open Access
 Published:
A statistical measure for the skewness of X chromosome inactivation based on family trios
BMC Geneticsvolume 19, Article number: 109 (2018)
Abstract
Background
X chromosome inactivation (XCI) is an important gene regulation mechanism in females to equalize the expression levels of X chromosome between two sexes. Generally, one of two X chromosomes in females is randomly chosen to be inactivated. Nonrandom XCI (XCI skewing) is also observed in females, which has been reported to play an important role in many Xlinked diseases. However, there is no statistical measure available for the degree of the XCI skewing based on family data in population genetics.
Results
In this article, we propose a statistical approach to measure the degree of the XCI skewing based on family trios, which is represented by a ratio of two genotypic relative risks in females. The point estimate of the ratio is obtained from the maximum likelihood estimates of two genotypic relative risks. When parental genotypes are missing in some family trios, the expectationconditionalmaximization algorithm is adopted to obtain the corresponding maximum likelihood estimates. Further, the confidence interval of the ratio is derived based on the likelihood ratio test. Simulation results show that the likelihoodbased confidence interval has an accurate coverage probability under the situations considered. Also, we apply our proposed method to the rheumatoid arthritis data from USA for its practical use, and find out that a locus, rs2238907, may undergo the XCI skewing against the atrisk allele. But this needs to be further confirmed by molecular genetics.
Conclusions
The proposed statistical measure for the skewness of XCI is applicable to complete family trio data or family trio data with some paternal genotypes missing. The likelihoodbased confidence interval has an accurate coverage probability under the situations considered. Therefore, our proposed statistical measure is generally recommended in practice for discovering the potential loci which undergo the XCI skewing.
Background
Many human diseases are associated with genes on X chromosome, such as asthma, autoimmune diseases, cancers, some neurological and psychiatric diseases [1,2,3,4,5]. Most of these Xlinked diseases often exhibit sexspecific patterns of susceptibility due to the difference in the number of copies of X chromosome between two sexes. Females have two copies of X chromosome whereas there is only one copy in males. To equalize the expression levels of X chromosome between sexes, dosage compensation is achieved by an important gene regulation mechanism in mammalian females, X chromosome inactivation (XCI), which results in expression silencing of one of two X chromosomes in females [6]. Up to 75% genes on X chromosome are subject to XCI, while there are about 15% escaping from inactivation and expressed from both X chromosomes, and the remaining 10% show variable inactivation patterns in different human cell lines [7].
During the process of XCI, one of two X chromosomes in females is chosen to be inactivated in a random way. This means that roughly 50% of cells in females have the paternal X chromosome expressed, while the others express the maternal one. Although random XCI occurs commonly, the XCI skewing also takes place in females, which is defined as the phenomenon of nonrandom inactivation that one of X chromosomes is selected to be silenced with a probability deviating from 50% [8]. Generally, the skewness of XCI is caused by a second selection mechanism. When the mutation on X chromosome affects the survival and proliferation of cells, the amount of cells carrying an active mutant X chromosome will become larger or smaller than that of cells with an active wildtype X chromosome, which thus leads to the skewness of XCI [9, 10]. Negative selection, where the mutation gives a growth disadvantage to cells, frequently happens in female carriers with Xlinked diseases, such as mental retardation disorders, WiskottAldrich syndrome and Xlinked severe combined immunodeficiency [11,12,13]. On the other hand, when the mutation provides a growth advantage to cells, positive selection occurs and can result in some diseases, such as adrenoleukodystrophy and breast cancer [14, 15].
In genetic association studies on X chromosome, Clayton [16] first took XCI into consideration. Due to XCI, the genotypic effect of homozygous females can be treated the same as that of hemizygous males. Therefore, the genotypic scores are given to be 0, 1 and 2 corresponding to three genotypes at a diallelic locus on X chromosome in females, and 0 and 2 corresponding to two genotypes in males. However, Wang et al. [17] pointed out that such coding strategy only considers the situation of random XCI, but ignores the skewness of XCI and escape from XCI. To account for all possible situations of XCI, Wang et al. suggested that the genotypic score for the heterozygous females, denoted by γ, can be any possible values between 0 and 2. Under XCI, the value of γ reflects the degree of inactivation skewing, with γ/2 representing the proportion of cells having the mutant allele active. As such, γ = 1 stands for random XCI, while γ between 1 and 2 indicates the XCI skewing toward the mutant allele and γ between 0 and 1 denotes the XCI skewing against the mutant allele. For example, γ = 0.5 means that the skewness of XCI is against the mutant allele with 25% cells expressing the mutant allele and the other 75% cells expressing the normal allele. On the other hand, in molecular genetics, the XCI skewing pattern can be identified by assays taking advantage of differential methylation between the active and inactive X chromosomes or mRNA transcription in cells [18,19,20]. However, since the XCI pattern always varies among cell lines [21, 22], these assays, which usually use cells from only a few tissues to investigate the XCI patterns, cannot present the status of the whole body [10]. Further, there is no statistical measure available for detecting the XCI skewing pattern in population genetics as yet.
Therefore, in this article, we give the expression of γ for family trios with both parents and one affected daughter in the presence of association between the disease and genotypes. In fact, γ is a function of two genotypic relative risks (GRRs) in females. In addition, we obtain the point estimate of γ from the maximum likelihood estimates (MLEs) of the GRRs. When parental genotypes are missing in some family trios, the expectationconditionalmaximization (ECM) algorithm [23] is used to obtain the corresponding MLEs. Further, the confidence interval (CI) of γ is derived based on a likelihood ratio test (LRT). Finally, simulation study is conducted to investigate the performance of our proposed method. The simulation results show that the proposed likelihoodbased CI has an accurate coverage probability under the situations considered. For practical use, we apply our proposed method to the rheumatoid arthritis (RA) data from USA.
Methods
Notations
Consider an Xlinked diallelic locus with mutant allele A and normal allele a. Let p_{m} and q_{m} = 1 − p_{m} denote the allele frequencies of A and a in males, respectively. Suppose that p_{f} is the allele frequency of A and ρ is the inbreeding coefficient in females. Then, the frequencies of genotypes aa, Aa and AA in females are respectively g_{0} = (1 − p_{f})^{2} + ρp_{f}(1 − p_{f}), g_{1} = 2(1 − ρ)p_{f}(1 − p_{f}) and \( {g}_2={p}_f^2+\rho {p}_f\left(1{p}_f\right) \). Note that HardyWeinberg equilibrium holds in the population under study when ρ = 0 and p_{m} = p_{f}. Let f_{0}, f_{1} and f_{2} respectively represent the penetrances in females with genotypes aa, Aa and AA. The GRRs in females are defined as λ_{1} = f_{1}/f_{0} and λ_{2} = f_{2}/f_{0}.
Relationship between penetrances and XCI skewing in females
Let the genotypic scores be 0, γ and 2 corresponding to females with genotypes aa, Aa and AA, respectively, where γ ∈ [0, 2] represents the XCI skewing pattern. We assume that a generalized genetic model holds [24, 25], which is defined as f_{0} ≤ f_{1} ≤ f_{2} (i.e., 1 ≤ λ_{1} ≤ λ_{2}) with at least one inequality being strict, in the presence of association between the disease and genotypes in females. If f_{1} is unknown, then it can be expressed as a function of γ, denoted by f_{1}(γ). To derive the expression of f_{1}(γ), let \( {f}_1^{\prime}\left(\gamma \right) \) be the first order derivative of f_{1}(γ) with respect to γ. As such, f_{1}(γ) can be approximated by a first order Taylor expansion around γ = 1 as follows,
On the other hand, when the XCI skewing is completely against A, we have γ=0 and f_{1} = f_{0}. So, from Eq. (1), \( {f}_0={f}_1(0)\approx {f}_1(1){f}_1^{\prime }(1) \). Similarly, when the XCI skewing is completely toward A, we have γ=2 and f_{1} = f_{2}. Then \( {f}_2={f}_1(2)\approx {f}_1(1)+{f}_1^{\prime }(1) \). Hence, f_{1}(1) = (f_{2} + f_{0})/2 and \( {f}_1^{\prime }(1)=\left({f}_2{f}_0\right)/2 \). Therefore, Eq. (1) turns to be
From Eq. (2), we notice that f_{1} is around the midpoint between f_{0} and f_{2} under random XCI (i.e., γ=1). Actually, Eq. (2) means that the penetrance for heterozygous females is approximately linear in the genotypic score γ around γ=1.
Further, if f_{1} is known, then we can obtain γ from Eq. (2) as follows, which is a function of f_{0}, f_{1} and f_{2}, or λ_{1} and λ_{2},
Note that the value of γ attains its maximum (γ = 2) when λ_{1} = λ_{2} ≠ 1, and γ = 0 when λ_{1} = 1 and λ_{2} ≠ 1. Assume that \( {\widehat{\lambda}}_1 \) and \( {\widehat{\lambda}}_2 \) are the MLEs of λ_{1} and λ_{2}, respectively. Then, the point estimate of γ, \( \widehat{\gamma} \), can be obtained by \( 2\left({\widehat{\lambda}}_11\right)/\left({\widehat{\lambda}}_21\right) \).
MLEs of λ _{1} and λ _{2} using family trios without missing genotypes
Here we only include the trios with affected daughters in the analysis. Male offspring are not investigated because they are not informative of λ_{1} and λ_{2}. Firstly, we consider complete family trios, each with both typed parents and an affected typed daughter. Let F, M and C represent the numbers of allele A in father, mother and daughter, respectively, and D denote that the daughter is affected. Eight possible types of FMC (i.e., 000, 010, 011, 021, 101, 111, 112 and 122) together with the corresponding probabilities P(FM), P(C FM) and P(FMC D) are shown in Table 1. Let Ω be the set of the eight possible types of FMC listed in Table 1 and then P(FMC D) is derived as
Equation (3) holds when the disease status of a daughter is only related to her own genotype, and P(C FM) is determined by Mendelian transmission, which is equal to 0.5 for heterozygous mother and 1 otherwise. Assume that P(FM) = P(F)P(M), and divide the numerator and denominator of Eq. (3) by f_{0}. Then, P(FMC D) for each trio type can be written as the last column in Table 1, where R = q_{m}g_{0} + 0.5q_{m}g_{1}(1 + λ_{1}) + q_{m}g_{2}λ_{1} + p_{m}g_{0}λ_{1} + 0.5p_{m}g_{1}(λ_{1} + λ_{2}) + p_{m}g_{2}λ_{2}. The detailed derivation of P(FMC D) in Table 1 is given in Additional file 1: Appendix A.
Since we find that it is more convenient to directly estimate g_{0} and g_{1} rather than ρ and p_{f}, we let θ= (p_{m}, g_{0}, g_{1}, λ_{1}, λ_{2})^{T} be the parameter vector of interest. As such, the loglikelihood function of the observed data conditional on the daughter being affected is given by
where n_{FMC} is the number of the family trios of type FMC. To obtain the MLE of θ, numerical methods, such as NewtonRaphson algorithm (by using “maxLik” package in R software [26]) and the ECM algorithm introduced later, are applied. We choose the initial values of p_{m}, g_{0}, g_{1}, λ_{1} and λ_{2} as follows: \( {\widehat{p}}_m^{(0)}=\#\left(F=1\right)/\#\left(F\in \left\{0,1\right\}\right), \) \( {\widehat{g}}_0^{(0)}=\#\left(M=0\right)/\#\left(M\in \left\{0,1,2\right\}\right) \), \( {\widehat{g}}_1^{(0)}=\#\left(M=1\right)/\#\left(M\in \left\{0,1,2\right\}\right), \) \( {\widehat{\lambda}}_1^{(0)}={n}_{011}/{n}_{010} \) and \( {\widehat{\lambda}}_2^{(0)}=\left({n}_{011}{n}_{112}\right)/\left({n}_{010}{n}_{111}\right) \), where # denotes the counting measure. The details about the choice of these initial values are shown in Additional file 1: Appendix B.
MLEs of λ _{1} and λ _{2} using family trios with parental genotypes missing
It is common that parental genotypes are missing in some family trios. For trios with paternal or maternal genotype missing, we call them “motherdaughter pairs” or “fatherdaughter pairs”, denoted by MC and FC, respectively. Thus, MC takes possible genotypes from Ω_{MC}={00, 01, 10, 11, 12, 21, 22}, and FC takes possible genotypes from Ω_{FC}={00, 01, 11, 12}. As for trios with both parental genotypes missing, we refer to them as “single daughters”. The probabilities of the motherdaughter pair MC, fatherdaughter pair FC and single daughter C given the daughter being affected are respectively\( P\left( MCD\right)={\sum}_{F\in \left\{0,1\right\}}P\left( FMCD\right) \), \( P\left( FCD\right)={\sum}_{M\in \left\{0,1,2\right\}}P\left( FMCD\right) \) and \( P\left(CD\right)={\sum}_{F\in \left\{0,1\right\}}{\sum}_{M\in \left\{0,1,2\right\}}P\left( FMCD\right) \), where P(FMC D) are given in Table 1.
Suppose that we collect n_{FMC} family trios of type FMC, n_{1m, MC} motherdaughter pairs of type MC, n_{1f, FC} fatherdaughter pairs of type FC and n_{0, C} single daughters of type C, where the subscripts 1 m, 1f and 0 respectively mean that each trio has only a mother, only a father and no parents. Then, the loglikelihood function of the observed data is
Let N_{2}, N_{1m}, N_{1f} and N_{0} respectively be the numbers of family trios, motherdaughter pairs, fatherdaughter pairs and single daughters. Then, \( {N}_2={\sum}_{FMC\in \Omega}{n}_{FMC} \), \( {N}_{1m}={\sum}_{MC\in {\Omega}_{MC}}{n}_{1m, MC},{N}_{1f}={\sum}_{FC\in {\Omega}_{FC}}{n}_{1f, FC} \), \( {N}_0={\sum}_{C\in \left\{0,1,2\right\}}{n}_{0,C} \) and the total sample size N = N_{2} + N_{1m} + N_{1f} + N_{0}.
Since it is not so easy to obtain the MLE of θ from the above observed loglikelihood function (4), the ECM algorithm will be employed. Assume that \( {n}_{1m, MC}={\sum}_{F\in \left\{0,1\right\}}{z}_{1m, FMC} \), \( {n}_{1f, FC}={\sum}_{M\in \left\{0,1,2\right\}}{z}_{1f, FMC} \) and n_{0, C}= \( {\sum}_{F\in \left\{0,1\right\}}{\sum}_{M\in \left\{0,1,2\right\}}{z}_{0, FMC} \), where z_{1m, FMC}, z_{1f, FMC} and z_{0, FMC} are the unobserved numbers of trios FMC for motherdaughter pairs MC, fatherdaughter pairs FC and single daughters C, respectively (see Additional file 1: Tables S1S3). Then, the loglikelihood function for the complete data (n_{FMC}, z_{1m, FMC}, z_{1f, FMC}, z_{0, FMC}) can be written as
The following ECM algorithm contains one Estep and five CMsteps at each iteration. In the Estep at iteration (k + 1), we obtain the conditional expectation of lnL_{C}(θ) with respect to the conditional distributions of z_{1m, FMC}, z_{1f, FMC} and z_{0, FMC} given n_{1m, MC}, n_{1f, FC} and n_{0, C}, respectively. z_{1m, FMC} ∣ n_{1m, MC}, z_{1f, FMC} ∣ n_{1f, FC} and z_{0, FMC} ∣ n_{0, C} follow the binomial distributions with respective success probabilities P(F MC, D), P(M FC, D) and P(FM C, D). Thus, the Q function is given by
\( \mathrm{where}\kern0.5em {\hat{\theta}}^{(k)}={\left({\hat{p}}_m^{(k)},{\hat{g}}_0^{(k)},{\hat{g}}_1^{(k)},{\hat{\lambda}}_1^{(k)},{\hat{\lambda}}_2^{(k)}\right)}^T \) is the MLE of θ at iteration k,
and
In the CMsteps, the Q function is maximized with respect to each of components of θ in turn, with the others fixed at their previous values. The MLE of θ at iteration (k + 1) are given in Additional file 1: Appendix B. The initial value of θ is obtained only based on N_{2} complete family trios when N_{2} ≠ 0 (see Additional file 1: Appendix B). However, when N_{2} = 0, we estimate the initial values of λ_{1} and λ_{2} by replacing unknown n_{010}, n_{011}, n_{111} and n_{112} values in \( {\widehat{\lambda}}_1^{(0)}={n}_{011}/{n}_{010} \) and \( {\widehat{\lambda}}_2^{(0)}=\left({n}_{011}{n}_{112}\right)/\left({n}_{010}{n}_{111}\right) \) by their respective conditional expectations (see Additional file 1: Tables S1S3). For example, n_{011} is replaced by
where \( {\widehat{p}}_m^{(0)} \), \( {\widehat{q}}_m^{(0)}=1{\widehat{p}}_m^{(0)} \), \( {\widehat{g}}_0^{(0)} \), \( {\widehat{g}}_1^{(0)} \) and \( {\widehat{g}}_2^{(0)}=1{\widehat{g}}_0^{(0)}{\widehat{g}}_1^{(0)} \)are the initial values of p_{m}, q_{m}, g_{0}, g_{1} and g_{2}, respectively. The details about the choice of these initial values are shown in Additional file 1: Appendix B. Given the initial value of θ, the steps mentioned above continue until the convergence criterion is satisfied. For example, the absolute differences between the estimates of the parameters at two consecutive iterations are all less than 10^{−7}. In addition, note that the ECM algorithm still works when there are no missing genotypes in all the family trios. However, it contains only the CM steps in this situation and can be regarded as a special case of the cyclic coordinate ascent method, which is simple and stable [23].
Confidence interval of γ based on likelihood method
To obtain the CI, we first construct a LRT for testing the null hypothesis H_{0}: γ = γ_{0} as follows,
where \( \widehat{\theta}= \) (\( {\widehat{p}}_m \), \( {\widehat{g}}_0 \), \( {\widehat{g}}_1 \), \( {\widehat{\lambda}}_1 \), \( {\widehat{\lambda}}_2 \))^{T} is the MLE of θ under H_{1}. Let θ_{0}= (p_{m}, g_{0}, g_{1}, λ_{2})^{T} be the parameter vector under H_{0} with γ_{0} = 2(λ_{1} − 1)/(λ_{2} − 1) (i.e., λ_{1} = γ_{0}(λ_{2} − 1)/2 + 1), and then \( {\overset{\sim }{\theta}}_0= \) (\( {\overset{\sim }{p}}_m \), \( {\overset{\sim }{g}}_0 \), \( {\overset{\sim }{g}}_1 \), \( {\overset{\sim }{\lambda}}_2 \))^{T} denotes the MLE of θ_{0}. The choice of the initial value of θ_{0} and the solution of \( {\overset{\sim }{\theta}}_0 \) using family trios with missing parental genotypes is given in Additional file 1: Appendix B. The LRT asymptotically follows a chisquare distribution with the degree of freedom being one (i.e., \( {\chi}_1^2 \)).
At the significance level α, the 100(1 − α)% confidence interval of γ based on the LRT is
and the confidence limits are the values that satisfy
Note that there is no closedform solution of Eq. (6). Thus, numerical method is applied, such as functions from “rootSolve” package in R software [26]. Let γ_{L} and γ_{U} be two unequal roots of Eq. (6) with γ_{L} < γ_{U}. Generally, the 100(1 − α)% CI of γ would be (γ_{L}, γ_{U}). However, since the true value of γ is bounded in [0, 2], the original estimated CI of γ will be truncated by [0, 2] if necessary. As such, the ultimate CI of γ is (γ_{L}, γ_{U})∩[0, 2], which is easier to be interpreted than the origin CI.
Discontinuity problem of confidence interval of γ
Note that γ is a ratio, so like other ratio estimates [27], we find that the proposed CI may consist of two disjoint intervals, such as [0, 0.03)∪(0.59, 2]. In this article, this kind of CIs is referred to as “discontinuous CI” for convenience. Let’s take a close look at this discontinuity problem by the following example. Consider the situation of (n_{000}, n_{010}, n_{011}, n_{021}, n_{101}, n_{111}, n_{112}, n_{122})=(191, 89, 112, 54, 114, 59, 62, 19). Then, \( \widehat{\gamma} \) is 1.92 and two roots of Eq. (6) are 0.03 and 0.59, respectively. If the CI is set to be (0.03, 0.59) normally, to our surprise, \( \widehat{\gamma} \) is located outside this CI. When testing the null hypothesis H_{0}: γ = γ_{0}, we find that γ_{0} taking values between 0.03 and 0.59 is rejected by the LRT. This means that the interval (0.03, 0.59) is actually a rejection region of the corresponding LRT rather than an acceptance region. Hence, the CI of γ turns to be (−∞, 0.03)∪(0.59,+∞), and [0, 0.03)∪(0.59, 2] after being truncated. The discontinuous CI may occur when the denominator of the ratio is close to zero (i.e., λ_{2} is close to 1 in this article) [28]. In fact, when λ_{2} = 1, we assume that we cannot obtain information on the XCI skewing pattern according to the CI of γ. This is because our proposed method measures XCI skewness in the presence of association between the disease and genotypes (i.e., λ_{2} ≠ 1). On the other hand, although these discontinuous CIs are considered to be uninformative and are difficult to be interpreted, there is no satisfactory “objective” method for dealing with this problem well [27].
Simulation settings
To assess the performance of the proposed method, we conduct the following simulation study. The sample size N is taken to be 700, which is close to that of RA data (757 pedigrees) [29]. We consider six different combinations of (N_{2}, N_{1m}, N_{1f}, N_{0}), which are referred to as six “missing patterns” (MP1–MP6) for convenience and are shown in Table 2. When the missing pattern changes from MP1 to MP4, the number of caseparents trios N_{2} decreases and the number of single daughters N_{0} increases with N_{1m} = N_{1f}. For MP5 and MP6, the number of motherdaughter pairs N_{1m} is different from that of fatherdaughter pairs N_{1f} with N_{2} = N_{0}. In addition, (p_{m}, p_{f}) is set to be (0.30, 0.30), (0.25, 0.30), (0.30, 0.25), (0.20, 0.20), (0.15, 0.20) and (0.20, 0.15), and we assume that ρ=0 and 0.05, and λ_{2}=1.5 and 2. λ_{1} is calculated from λ_{1} = γ(λ_{2} − 1)/2 + 1, where γ varies from 0 to 2 in increments of 0.5. Given p_{m}, p_{f}, ρ, λ_{2} and γ, N_{2} caseparents trios are randomly generated from a multinomial distribution with probabilities P(FMC D) shown in Table 1. Similarly, N_{1m} motherdaughter pairs, N_{1f} fatherdaughter pairs and N_{0} single daughters are randomly drawn from the multinomial distributions with probabilities \( P\left( MCD\right)={\sum}_{F\in \left\{0,1\right\}}P\left( FMCD\right) \), \( P\left( FCD\right)={\sum}_{M\in \left\{0,1,2\right\}}P\left( FMCD\right) \) and \( P\left(CD\right)={\sum}_{F\in \left\{0,1\right\}}{\sum}_{M\in \left\{0,1,2\right\}}P\left( FMCD\right) \), respectively. The simulations are based on k=10,000 replicates and 5% significance level.
We assess the statistical properties of the CI by the following indexes. Let the coverage probability (CP) be the proportion that the CI contains the true value of γ among k replicates. Note that under H_{0}: γ = γ_{0}, the estimated type І error rate of the LRT is 1−CP. ML and MR denote the left tail error and the right tail error (missing the true value of γ), respectively, with ML \( =\#\left[\left(\gamma <{\gamma}_L\right)\cap \left({\gamma}_L\le \widehat{\gamma}\le {\gamma}_U\right)\right]/k \) and MR \( =\#\left[\left(\gamma >{\gamma}_U\right)\cap \left({\gamma}_L\le \widehat{\gamma}\le {\gamma}_U\right)\right]/k \). Further, we use ML/(ML + MR) to measure the balance of ML and MR, which will be close to 0.5 when the balance is achieved. Notice that we do not consider those discontinuous CIs when calculating ML and MR, since we cannot distinguish between the left side and the right side of the discontinuous CIs. As such, we use DP\( =1\#\left({\gamma}_L\le \widehat{\gamma}\le {\gamma}_U\right)/k \) to represent the proportion of the discontinuous CIs among k replicates. In addition, note that the distribution of a ratio is not necessarily symmetric [30, 31], and the median can be always used to estimate the central tendency of a skewed distribution better than the mean [32]. So, we give the median of the point estimates of γ over k replicates under each simulation scenario. Further, for simulating the power of the LRT, we fix γ_{0} at 0, 1 and 2. Finally, we also compare the simulation results under MP1 (consisting of only 700 family trios with both parents) based on the ECM algorithm with those based on the NewtonRaphson algorithm. It is found that the results of the two algorithms are almost consistent with each other (data not shown for brevity). Therefore, we only give the simulation results on the basis of the ECM algorithm in the following section.
Results
Simulation results
Table 3 lists the estimated CP, ML/(ML + MR) and DP of the likelihoodbased CI of γ against MP and γ with ρ=0, λ_{2}=1.5, and (p_{m}, p_{f}) being (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25). It is shown in the table that the CP is around 95% under the situations considered. On the other hand, we find that ML/(ML + MR) and DP appear not to be greatly affected by (p_{m}, p_{f}). However, the value of γ has strong effect on ML/(ML + MR) and DP. When γ takes values on the boundary (i.e., 0 and 2), ML/(ML + MR) always stays close to 1 and 0, respectively, which indicates extreme imbalance of two tail errors. DP increases as γ gets close to the boundary. Also, the missing pattern has great influence on both ML/(ML + MR) and DP. When the missing pattern varies from MP1 to MP4, where the number of caseparents trios decreases and that of single daughters increases, ML/(ML + MR) becomes more and more far away from 0.5 and DP sharply increases. We also find that under MP5, where the number of motherdaughter pairs is larger than that of fatherdaughter pairs, ML/(ML + MR) is a little closer to 0.5 and DP becomes smaller compared to those under MP6.
Table 4 shows the corresponding statistical properties of the CI of γ with ρ=0, λ_{2}=2, and (p_{m}, p_{f}) being (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25). As expected, the CP is still controlled well, the ML/(ML + MR) is closer to 0.5 and DP is lower with larger λ_{2}. We also investigate the effect of ρ=0.05, and the corresponding results are similar to those of ρ=0 (see Additional file 1: Tables S4 and S5). On the other hand, when (p_{m}, p_{f}) is set to be (0.20, 0.20), (0.15, 0.20) and (0.20, 0.15), the results are similar to those when (p_{m}, p_{f}) being taken as (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25) (see Additional file 1: Tables S6–S9). In addition, the median of the point estimates of γ among k replicates under each simulation scenario is shown in Additional file 1: Figures S1 and S2. From Additional file 1: Figure S1, we can see that the median of \( \widehat{\gamma} \) gets more far away from the true value of γ as the missing pattern varies from MP1 to MP4, and it is always slightly closer to γ under MP5 than that under MP6. The increase of the value of λ_{2} also improves the accuracy of the median of \( \widehat{\gamma} \), while the values of ρ, p_{m} and p_{f} seem to have no great influence on the median of \( \widehat{\gamma} \).
The simulated powers of the LRT for testing H_{0} : γ = γ_{0} with (p_{m}, p_{f}) = (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25) are given in Figs. 1, 2, 3, 4. Fig. 1 shows the simulated powers of the LRT against γ under MP1–MP4 with ρ= 0 and λ_{2}=1.5. From the first row to the third row of the panels in Fig. 1, (p_{m}, p_{f}) is set to be (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25), respectively. From the first column to the third column, γ_{0} is fixed at 0, 1 and 2, respectively. It is found that the power increases as the value of ∣γ − γ_{0}∣ gets larger. For example, in Fig. 1(a), when testing for H_{0}: γ = γ_{0} with γ_{0}= 0 (XCI skewing completely against mutant allele), the power under γ=1.5 (75% cells express mutant allele) is greater than that under γ= 0.5 (25% cells express mutant allele). On the other hand, when the missing pattern changes from MP1 to MP4, the loss in power is always substantial. Also, we compare the corresponding powers under MP5 and MP6 in Fig. 2. The power under MP5 is always higher than that under MP6 when γ ≠ γ_{0}, which implies that the motherdaughter pairs contain more information on the skewness of XCI than the fatherdaughter pairs. This is not surprising because when the father’s genotype is missing in a trio, it can be inferred according to the available mother’s and daughter’s genotypes, except for the motherdaughter pair of type MC = 11, whereas we cannot infer the missing mother’s genotypes from any fatherdaughter pairs. In addition, Figs. 3 and 4 give the simulated powers of the LRT under ρ=0 and λ_{2}=2 for MP1–MP4 and MP5–MP6, respectively. It is shown that the increase of λ_{2} leads to the growth in power (Fig. 3 vs. Fig. 1, Fig. 4 vs. Fig. 2). We also simulate the powers under ρ=0.05 and (p_{m}, p_{f}) = (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25), which are similar to those under ρ=0 (see Additional file 1: Figures S3–S6). Finally, the simulated powers under (p_{m}, p_{f}) = (0.20, 0.20), (0.15, 0.20) and (0.20, 0.15) are given in Additional file 1: Figures S7–S14, which are always lower than those under (p_{m}, p_{f}) = (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25), respectively.
Application to RA data
Rheumatoid arthritis (RA) is an autoimmune disease, which has been reported to be associated with the skewness of XCI [33]. To investigate the XCI skewing patterns at the Xlinked loci associated with RA, we apply our proposed method to the data from North American Rheumatoid Arthritis Consortium [29], which is made available from Genetic Analysis Workshop 15 [34]. The dataset includes 757 pedigrees and 293 single nucleotide polymorphism (SNP) markers on X chromosome. In this application, one nuclear family with a typed affected daughter is selected randomly from each pedigree. As such, a total of 703 nuclear families are included, which contains 64 caseparents trios, 179 motherdaughter pairs, 37 fatherdaughter pairs and 423 single daughters.
Since our proposed method is applicable in the presence of association, we ultimately measure the degree of XCI skewing at five SNPs which have been found to be associated with RA by the XMCPDT method at the significance level of 1% [35]. Notice that the XMCPDT method is conducted based on 246 pedigrees from the RA dataset. We identify the atrisk allele by the value of \( {\widehat{\lambda}}_2 \), and denote the estimates of the frequencies of the atrisk allele in males and females obtained from the ECM algorithm by \( {\widehat{p}}_m \) and \( {\widehat{p}}_f \), respectively. Table 5 lists the pvalue of XMCPDT, the values of (\( {\widehat{p}}_m \), \( {\widehat{p}}_f \)), \( {\widehat{\lambda}}_2 \) and \( \widehat{\gamma} \), and 95% CI of γ based on the proposed likelihood method for each of five SNPs. From Table 5, we find that there are three SNPs (rs916685, rs1043034 and rs2005463) with the 95% CIs containing the value of γ=1, which indicates the random XCI. On the other hand, the XCI skewing at rs2238907 is found with \( \widehat{\gamma}= \)0.35 and the 95% CI being [0, 0.79), which suggests that the skewness of XCI is against the atrisk allele with 17.5% (0.35/2) cells in heterozygous females having the atrisk allele active, while the other 82.5% cells keeping the normal allele active. However, the 95% CI of γ at rs1264064 is [0, 2], providing no information on the XCI skewing pattern. In addition, we evaluate \( \widehat{\gamma} \)’s and the 95% CIs of γ at the rest 288 SNPs in the RA dataset, and find that there are 21 SNPs with nonrandom XCI pattern. But note that, if we assume that all of 293 SNPs except for rs2238907 are under H_{0}: γ = 1, then the corresponding false positive rate would be 0.0719 (21/292), which is still below the upper bound \( 0.05+1.96\kern0.5em \times \kern0.5em \sqrt{0.05\kern0.5em \times \kern0.5em \left(10.05\right)/292}=\mathrm{0.0750.} \) Besides, association between these 21 SNPs and RA has not been found by XMCPDT at the 1% significance level, so we should draw conclusions with this caution.
Discussion
In this article, we propose a statistical measure γ of the degree of the XCI skewing for family trio data, which can be represented as a ratio of two GRRs in females in the presence of association between the disease and genotypes. Further, we obtain the point estimate of γ, which is constructed by the MLEs of two GRRs in females. When there are missing parental genotypes in some family trios, the ECM algorithm is used to estimate the two GRRs. The CI of γ is derived from the likelihood method by inverting the LRT. We conduct the simulation study under various simulation settings, including six missing patterns of families, six groups of allele frequencies, two different values of inbreeding coefficient in females, two different values of λ_{2} and five different values of γ. The simulation results show that the proposed likelihoodbased CI of γ has an accurate CP under the situations considered, while ML/(ML + MR) and DP of the CI of γ and the median of estimates of γ are influenced by the values of γ, λ_{2} and the missing pattern. Similarly, the simulated power of LRT is affected by the values of ∣γ − γ_{0}∣, λ_{2}, (p_{m}, p_{f}) and the missing pattern. Finally, we apply our proposed method to the RA data from USA and find out a locus, rs2238907, which may undergo the XCI skewing against the atrisk allele.
Many Xlinked diseases are always associated with XCI skewing in females. Our proposed statistical measure γ provides information on the potential loci subject to XCI skewing, thus it is helpful to uncover the pathogenesis of Xlinked diseases. However, most of the statistical studies on X chromosome today focus mainly on the association tests [17, 24, 25, 36,37,38], so there are no other statistical methods available to measure the skewness of XCI. On the other hand, although the XCI skewing pattern can also be detected by differential methylation between the active and inactive X chromosomes or mRNA transcription in cells, our proposed statistical method takes use of population data to measure the skewness of XCI. Thus, it can reflect the average level of the XCI skewing in female population.
There are some issues in our proposed method. First of all, the original CI is truncated by [0, 2] to enhance the interpretability of the CI. However, when the whole original CI lies outside [0, 2], the CI ultimately obtained after truncation would be empty. Although it is hard to interpret this kind of CI containing no values, the simulation results show that when γ takes values on the boundary (i.e., 0 and 2), these empty CIs seldom occur. For example, when γ = 0, two tail errors (ML and MR) are extremely imbalance with ML/(ML + MR) being 1 or close to 1. This means that there are no or very few original CIs whose upper limit is below 0. Likewise, ML/(ML + MR) is 0 or close to 0 when γ = 2, which implies that there are no or very few original CIs whose lower limit is beyond 2. On the other hand, the proposed likelihood method has its own drawback in deriving the CI of a ratio like any other ratio estimation methods. We find that the likelihoodbased CI of γ may consist of two disjoint intervals, such as [0, 0.03) ∪ (0.59, 2], and it is also difficult for us to interpret. For example, if \( \widehat{\gamma}=1.92 \) and the CI of γ is [0, 0.03) ∪ (0.59, 2], then the corresponding LRT would reject the null hypothesis of γ_{0} = 0.5, and accept that of γ_{0} = 0.01. It is hard to explain that the LRT rejects a γ_{0} being close to \( \widehat{\gamma} \), while accepts one being far away from \( \widehat{\gamma} \). Although this kind of CI is undesirable, it is also inevitable and can be regarded as a hint of λ_{2} being close to 1 [39]. In addition, it should be noted that the ECM algorithm is not applicable when all the family trios are “single daughters”, since the MLE of θ may not be uniquely specified under this situation (the details see Additional file 1: Appendix C). However, if the other family trios were collected, then the single daughters can make contribution to the MLE of θ in the ECM algorithm together with these trio data (the details see Additional file 1: Appendix D). Finally, we assume that the genotypes’ frequencies in males and females (p_{m}, g_{0} and g_{1}) are unknown and estimate them together with λ_{1} and λ_{2} in our simulation study and real data application. Alternatively, if we can obtain information on the allele frequencies from the online databases, such as the Allele Frequency Net Database [40] and the UCSC Genome Browser Database [41], then it is unnecessary to reestimate p_{m}, g_{0} and g_{1}, which will reduce the number of parameters so that the ECM algorithm runs faster.
Note that the ECM algorithm can converge to a local maximum of the loglikelihood function instead of a global maximum [23]. To investigate this, we randomly choose 1000 initial values of θ (θ_{0}) from the parameter space and regard the MLE of θ (θ_{0}) with the maximum loglikelihood among 1000 \( \ln L\left(\widehat{\theta}\right) \)'s (\( \ln L\left({\overset{\sim }{\theta}}_0\right) \)'s) as the global MLE of θ (θ_{0}). We conduct a simulation study under the simulation settings with ρ=0, λ_{2}= 1.5 and (p_{m}, p_{f}) = (0.30, 0.30), and the details see Additional file 1: Appendix E. The simulation results (see Additional file 1: Tables S10 and S11) show that the values of \( \widehat{\theta} \) and \( \ln L\left(\widehat{\theta}\right) \) (\( {\overset{\sim }{\theta}}_0 \) and \( \ln L\left({\overset{\sim }{\theta}}_0\right) \)) based on one initial value estimated by the method described in Additional file 1: Appendix B are very close to those based on 1000 initial values under all the simulated situations when N_{2} (the number of complete family trios) is not too small, such as MP1 and MP2, which may indicate that the ECM algorithm based on the estimated initial value converges towards the global maximum. As for MP3MP6, except that \( {\overset{\sim }{\theta}}_0 \) with (γ_{0}, γ) = (1, 2) under MP5 and MP6, \( {\overset{\sim }{\theta}}_0 \) with (γ_{0}, γ) = (1, 1) and (1, 2) under MP3, and \( {\overset{\sim }{\theta}}_0 \) with (γ_{0}, γ) = (1, 1), (1, 1.5) and (1, 2) under MP4 may converge to a local maximum, all the other \( \widehat{\theta} \) and \( {\tilde{\theta}}_0 \) results converge to the global maximum. Further, for these seven cases, we try and randomly select ten groups of initial values of θ_{0} from the parameter space and regard \( {\tilde{\theta}}_0 \) with the maximum loglikelihood among ten \( \ln L\left({\tilde{\theta}}_0\right) \)'s as the final MLE of θ_{0}. We find that \( {\tilde{\theta}}_0 \)’s based on ten and 1000 initial values are very close to each other under all the seven simulated situations (see Additional file 1: Table S11). So, if N_{2} is zero or too small, we recommend using multiple initial values (such as ten) for obtaining the global MLE of θ_{0}. On the other hand, family trios with both parents are always fortunately collected in the familybased studies in practice.
In future studies, we will extend our proposed method to incorporate covariates by using nuclear families with affected and unaffected offspring. Furthermore, to facilitate the interpretability of the CI of γ, we will utilize the prior information, such as the order of the GRRs in females and the information of the presence of association.
Conclusions
The proposed statistical measure for the skewness of XCI is applicable for complete family trio data or family trio data with some paternal genotypes missing. The likelihoodbased CI has an accurate CP under the situations considered. Therefore, our proposed statistical measure is generally recommended in practice for discovering the potential loci which undergo the XCI skewing.
Abbreviations
 CI:

Confidence interval
 CP:

Coverage probability
 DP:

Proportion of the discontinuous CIs
 ECM:

Expectationconditional maximization
 GRR:

Genotypic relative risk
 LRT:

Likelihood ratio test
 ML:

Left tail error
 MLE:

Maximum likelihood estimate
 MP:

Missing patterns
 MR:

Right tail error
 RA:

Rheumatoid arthritis
 SNP:

Single nucleotide polymorphism
 XCI:

X chromosome inactivation
 XMCPDT:

Monte Carlo pedigree disequilibrium test on X chromosome
References
 1.
Postma DS. Gender differences in asthma development and progression. Gend Med 2007;4 Suppl B: S133–46.
 2.
Invernizzi P, Pasini S, Selmi C, Gershwin ME, Podda M. Female predominance and X chromosome defects in autoimmune diseases. J Autoimmun. 2009;33:12–6.
 3.
Dorak MT, Karpuzoglu E. Gender differences in cancer susceptibility: an inadequately addressed issue. Front Genet. 2012;3:268.
 4.
Qureshi IA, Mehler MF. Genetic and epigenetic underpinnings of sex differences in the brain and in neurological and psychiatric disease susceptibility. Prog Brain Res. 2010;186:77–95.
 5.
Skuse DH. Imprinting, the Xchromosome, and the male brain: explaining sex differences in the liability to autism. Pediatr Res. 2000;47:9–16.
 6.
Chow JC, Yen Z, Ziesche SM, Brown CJ. Silencing of the mammalian X chromosome. Annu Rev Genomics Hum Genet. 2005;6:69–92.
 7.
Carrel L, Willard HF. Xinactivation profile reveals extensive variability in Xlinked gene expression in females. Nature. 2005;434:400–4.
 8.
Minks J, Robinson WP, Brown CJ. A skewed view of X chromosome inactivation. J Clin Invest. 2008;118:20–3.
 9.
Van den Veyver IB. Skewed X inactivation in Xlinked disorders. Semin Reprod Med. 2001;19:183–91.
 10.
Brown CJ. Skewed Xchromosome inactivation: cause or consequence? J Natl Cancer Inst. 1999;91:304–5.
 11.
Plenge RM, Stevenson RA, Lubs HA, Schwartz CE, Willard HF. Skewed Xchromosome is a common feature of Xlinked mental retardation disorders. Am J Hum Genet. 2002;71:168–73.
 12.
Prchal JT, Carroll AJ, Prchal JF, Crist WM, Skalka HW, Gealy WJ, Harley J, Malluh A. WiskottAldrich syndrome: cellular impairments and their implication for carrier detection. Blood. 1980;56:1048–54.
 13.
Conley ME, Lavoie A, Briggs C, Brown P, Guerra C, Puck JM. Nonrandom X chromosome inactivation in B cells from carriers of X chromosomelinked severe combined immunodeficiency. Proc Natl Acad Sci U S A. 1988;85:3090–4.
 14.
Salsano E, Tabano S, Sirchia SM, Colapietro P, Castellotti B, Gellera C, Rimoldi M, Pensato V, Mariotti C, Pareyson D, Miozzo M, Uziel G. Preferential expression of mutant ABCD1 allele is common in adrenoleukodystrophy female carriers but unrelated to clinical symptoms. Orphanet J Rare Dis. 2012;7:10.
 15.
Kristiansen M, Langerød A, Knudsen GP, Weber BL, BørresenDale AL, ørstavik KH. High frequency of skewed X inactivation in young breast cancer patients. J Med Genet. 2002;39:30–3.
 16.
Clayton D. Testing for association on the X chromosome. Biostatistics. 2008;9:593–600.
 17.
Wang J, Yu R, Shete S. Xchromosome genetic association test accounting for Xinactivation, skewed Xinactivation, and escape from Xinactivation. Genet Epidemiol. 2014;38:483–93.
 18.
Kubota T. A new assay for the analysis of Xchromosome inactivation in carriers with an Xlinked disease. Brain Dev. 2001;23(Suppl 1):S177–81.
 19.
Busque L, Zhu J, DeHart D, Griffith B, Willman C, Carroll R, Black PM, Gilliland DG. An expression based clonality assay at the human androgen receptor locus (HUMARA) on chromosome X. Nucleic Acids Res. 1994;22:697–8.
 20.
Szelinger S, Malenica I, Corneveaux JJ, Siniard AL, Kurdoglu AA, Ramsey KM, Schrauwen I, Trent JM, Narayanan V, Huentelman MJ, Craig DW. Characterization of X chromosome inactivation using integrated analysis of wholeexome and mRNA sequencing. PLoS One. 2014;9:e113036.
 21.
Carrel L, Willard HF. Heterogeneous gene expression from the inactive X chromosome: an Xlinked gene that escapes X inactivation in some human cell lines but is inactivated in others. Proc Natl Acad Sci U S A. 1999;96:7364–9.
 22.
Cotton AM, Lam L, Affleck JG, Wilson IM, Peñaherrera MS, McFadden DE, Kobor MS, Lam WL, Robinson WP, Brown CJ. Chromosomewide DNA methylation analysis predicts human tissuespecific X inactivation. Hum Genet. 2011;130:187–201.
 23.
Meng XL, Rubin DB. Maximum likelihood estimation via the ECM algorithm: a general framework. Biometrika. 1993;80:267–78.
 24.
Chen Z, Ng HKT, Li J, Liu Q, Huang H. Detecting associated singlenucleotide polymorphisms on the X chromosome in case control genomewide association studies. Stat Methods Med Res. 2017;26:567–82.
 25.
Wang P, Xu SQ, Wang BQ, Fung WK, Zhou JY. A robust and powerful test for case–control genetic association study on X chromosome. Stat Methods Med Res. 2018. https://doi.org/10.1177/0962280218799532.
 26.
Team RC. R: a language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. 2013. http://www.rproject.org. 2018.
 27.
Liseo B. Bayesian and conditional frequentist analyses of the Fieller’s problem. A critical review Metron 2003;LXI:133–50.
 28.
Hwang JTG. Fieller’s problems and resampling techniques. Stat Sinica. 1995;5:161–71.
 29.
Amos CI, Chen WV, Remmers E, Siminovitch KA, Seldin MF, Criswell LA, Lee AT, John S, Shephard ND, Worthington J, Cornelis F, Plenge RM, Begovich AB, Dyer TD, Kastner DL, Gregersen PK. Data for genetic analysis workshop (GAW) 15 problem 2, genetic causes of rheumatoid arthritis and associated traits. BMC Proc. 2007;1(Suppl 1):S3.
 30.
Marsaglia G. Ratios of normal variables and ratios of sums of uniform variables. J Am Stat Assoc. 1965;60:193–204.
 31.
DiazFrancés E, Rubio FJ. On the existence of a normal approximation to the distribution of the ratio of two independent normal random variables. Stat Pap. 2013;54:309–23.
 32.
Wilcox RR, Keselman HJ. Modern robust data analysis methods: measures of central tendency. Psychol Methods. 2003;8:254–74.
 33.
Chabchoub G, Uz E, Maalej A, Mustafa CA, Rebai A, Mnif M, Bahloul Z, Farid NR, Ozcelik T, Ayadi H. Analysis of skewed Xchromosome inactivation in females with rheumatoid arthritis and autoimmune thyroid diseases. Arthritis Res Ther. 2009;11:R106.
 34.
Genetic Analysis Workshop. 1982. https://www.gaworkshop.org. Accessed 3 Oct 2018.
 35.
Zou QL, You XP, Li JL, Fung WK, Zhou JY. A powerful parentoforigin effects test for qualitative traits on X chromosome in general pedigrees. BMC Bioinformatics. 2018;19:8.
 36.
Hickey PF, Bahlo M. X chromosome association testing in genome wide association studies. Genet Epidemiol. 2011;35:664–70.
 37.
Ma L, Hoffman G, Keinan A. Xinactivation informs variancebased testing for Xlinked association of a quantitative trait. BMC Genomics. 2015;16:241.
 38.
Choi S, Lee S, Qiao D, Hardin M, Cho MH, Silverman EK, Park T, Won S. FARVATX: familybased rare variant association test for Xlinked genes. Genet Epidemiol. 2016;40:475–85.
 39.
Gleser LJ, Hwang JT. The nonexistence of 100(1α)% confidence sets of finite expected diameter in errorsinvariables and related models. Ann Stat. 1987;15:1351–62.
 40.
Allele Frequency Net Database. 2015. http://www.allelefrequencies.net/. Accessed 3 Oct 2018.
 41.
UCSC Genome browser database. 2000. http://genome.ucsc.edu/. Accessed 3 Oct 2018.
Acknowledgments
The authors thank the editor and three reviewers for helpful comments that greatly improve the presentation of the article.
Funding
This study was supported by the National Natural Science Foundation of China grants 81,773,544, 81,373,098 and 81,573,207 and the National and Guangzhou University Students’ Innovation and Enterprise Training Project of China grant 201,612,121,017. The authors thank the Genetic Analysis Workshops for providing the RA data, which were supported by the National Institutes of Health grant R01 GM031575. The RA data were gathered with the support of grants from the National Institutes of Health grants N01AR22263 and R01AR44422, and the National Arthritis Foundation.
Availability of data and materials
The dataset supporting the conclusions of this article is from North American Rheumatoid Arthritis Consortium, which is made available from Genetic Analysis Workshop 15 (http://www.gaworkshop.org/).
Author information
Affiliations
Contributions
SQX, YZ, PW, WL, XBW and JYZ all contributed to the study design, analytical preparation and the writing of the manuscript. SQX and YZ performed the simulation studies. PW, WL, XBW and JYZ analyzed the data and revised the manuscript. All authors have read and approved the final manuscript.
Corresponding authors
Correspondence to XianBo Wu or JiYuan Zhou.
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file
Additional file 1:
Appendix A. Derivation of P(FMC D) in Table 1. Appendix B. Choice of initial value of θ (θ_{0}) and MLE of θ (θ_{0}) using family trios with missing parental genotypes. Appendix C. Inapplicability of ECM algorithm when using only single daughters. Appendix D. Contribution of single daughters to estimate of θ in ECM algorithm. Appendix E. Effect of different initial values of θ (θ_{0}) on ECM algorithm. Tables S1–S3. The conditional probabilities and conditional expectations for seven types of possible motherdaughter pairs, four types of possible fatherdaughter pairs and three types of possible single daughters, respectively. Tables S4–S5. Statistical properties of likelihoodbased confidence interval of γ against missing pattern (MP) and γ with ρ=0.05, (p_{m}, p_{f}) = (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25), and λ_{2}=1.5 and 2, respectively. Tables S6–S9. Statistical properties of likelihoodbased confidence interval of γ against missing pattern (MP) and γ with (p_{m}, p_{f})= (0.20, 0.20), (0.15, 0.20) and (0.20, 0.15), ρ=0 and 0.05, and λ_{2}=1.5 and 2, respectively. Table S10. Averages of absolute differences of each element of \( \widehat{\theta} \) and \( \ln L\left(\widehat{\theta}\right) \) between ECM_{1} and ECM_{1000} with ρ=0, λ_{2}= 1.5 and (p_{m}, p_{f}) = (0.30, 0.30) under MP1MP6. Table S11. Averages of absolute differences of each element of \( {\tilde{\theta}}_0 \) and \( \ln L\left({\tilde{\theta}}_0\right) \) between ECM_{1}/ECM_{10} and ECM_{1000} with ρ=0, λ_{2}= 1.5 and (p_{m}, p_{f}) = (0.30, 0.30) under MP1MP6. Figures S1–S2. Medians of point estimates of γ against MP for different p_{m}, p_{f} and λ_{2} values with ρ=0 and 0.05, respectively. Figures S3–S6. Estimated powers of LRT against γ with ρ=0.05 and (p_{m}, p_{f}) being (0.30, 0.30), (0.25, 0.30) and (0.30, 0.25) under MP1–MP4 and MP5–MP6, and λ_{2}=1.5 and 2, respectively. Figures S7–S14. Estimated powers of LRT against γ with (p_{m}, p_{f}) being (0.20, 0.20), (0.15, 0.20) and (0.20, 0.15) under MP1–MP4 and MP5–MP6, ρ=0 and 0.05, and λ_{2}=1.5 and 2, respectively. (PDF 2328 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Received
Accepted
Published
DOI
Keywords
 X chromosome inactivation
 Skewing
 Family trio
 Ratio estimate