 Methodology article
 Open Access
 Published:
Rare variant association analysis in caseparents studies by allowing for missing parental genotypes
BMC Genetics volume 19, Article number: 7 (2018)
Abstract
Background
The development of nextgeneration sequencing technologies has facilitated the identification of rare variants. Familybased design is commonly used to effectively control for population admixture and substructure, which is more prominent for rare variants. Caseparents studies, as typical strategies in familybased design, are widely used in rare variantdisease association analysis. Current methods in caseparents studies are based on complete caseparents data; however, parental genotypes may be missing in caseparents trios, and removing these data may lead to a loss in statistical power. The present study focuses on testing for rare variantdisease association in caseparents study by allowing for missing parental genotypes.
Results
In this report, we extended the collapsing method for rare variant association analysis in caseparents studies to allow for missing parental genotypes, and investigated the performance of two methods by using the difference of genotypes between affected offspring and their corresponding “complements” in caseparent trios and TDT framework. Using simulations, we showed that, compared with the methods just only using complete caseparents data, the proposed strategy allowing for missing parental genotypes, or even adding unrelated affected individuals, can greatly improve the statistical power and meanwhile is not affected by population stratification.
Conclusions
We conclude that adding caseparents data with missing parental genotypes to complete caseparents data set can greatly improve the power of our strategy for rare variantdisease association.
Background
The development of nextgeneration sequencing technologies has facilitated association studies of rare variants (minor allele frequency (MAF) < 1%). Familybased design, as an important strategy in genetic studies (especially for rare variants) for human complex diseases, has some advantages over populationbased design [1, 2]. The most prominent advantage is that many familybased association methods can effectively control for population admixture and substructure which is more prominent for rare variants and thus avoid spurious associations due to population admixture or substructure [3, 4]. Moreover, familybased design can be used to study complex mechanisms, such as parentoforigin effects and maternally mediated genetic effects, which are difficult to detect with unrelated individuals in populationbased design [5]. Caseparents study, as a typical strategy in familybased design, is widely used in rare variantdisease association analysis. For example, combined multivariate and collapsing (CMC) [6], weighted sum statistic (WSS) [7, 8], variable threshold (VT) [9], and the burden of rare variants (BRV) have all been extended into the transmission/disequilibrium test (TDT) [10] framework [11]. Another commonly used method in caseparents study is to treat nontransmitted genotypes of parents to affected offspring as control (also called pseudocontrols or complements) of affected offspring [5, 12, 13]. For example, investigators can construct a difference vector by comparing the genotypes of affected offspring with their corresponding “complements” and use the collapsing method [6, 7] to detect rare causal variants.
A problem with the use of caseparents study is that not all of the parental genotypes (one or both) are available in practice. For example, parents may have died, especially for older patients with LateOnset diseases such as Alzheimer disease and hypertension, or parents may decline to participate in clinical research. It is often difficult to recruit large enough samples for caseparents study, especially for rare disease, and thus the sample size is generally small. Discarding those families with missing one or both parental genotypes can lead to statistical power loss. Statistical methods in caseparents study allowing for missing parental genotypes have been widely developed for common variantdisease association analysis [14, 15]. However, few works discuss rare variantdisease association in caseparents study when parental genotypes are missing. Because missing both parental genotypes implies case only (or unrelated affected individuals), allowing for missing one parental genotypes or case only will increase sample size in caseparents study and thus may enhance statistical power for rare variant association analysis. Therefore, it is useful to develop statistical approaches in caseparents study by allowing for missing parental genotypes to test rare variantdisease association.
In this report, we will extend the collapsing method for rare variant association analysis to caseparents study by using the genotype difference of affected offspring with their corresponding “complements” in caseparents trios and TDT framework. Our strategy allows for missing one or both parental genotypes (or case only). We develop our strategy in homogenous populations. Through simulation studies, we investigate the performance of the proposed method in a homogenous population as well as in populations with population stratification under three scenarios: complete caseparents data mixed with one parental genotypes missing, complete caseparents data mixed with both parental genotypes missing, and complete caseparents data mixed with one and both parental genotypes missing.
Methods
In this study, all datasets were publically available and no research requiring ethics approval was conducted.
Notation
Consider a data set in a homogenous population Ω = {Ω_{ 0 }, Ω_{ Ι }, Ω_{ ΙΙ }} consists of three types of caseparents trios with the genotype of affected offspring known in each family. Ω_{ 0 }, Ω_{ Ι }, and Ω_{ ΙΙ } denote three types of caseparents trios when there are 0, 1, and 2 missing parental genotypes, respectively. We consider three combinations of Ω_{ 0 }, Ω_{ Ι }, and Ω_{ ΙΙ }: Ω_{0 + Ι} = {Ω_{ 0 }, Ω_{ Ι }}, Ω_{0 + ΙΙ} = {Ω_{ 0 }, Ω_{ ΙΙ }}, and Ω_{0+Ι+ΙΙ} = {Ω_{ 0 }, Ω_{ Ι }, Ω_{ ΙΙ }}. Ω_{0 + Ι} is the samples data set consisting of complete caseparents trio with the known genotypes for each member in the trio (Ω_{ 0 }) and caseparents trio with missing one parental genotype (type Ω_{ Ι }). Ω_{0 + ΙΙ} is the samples data set consisting of type Ω_{ 0 } and type Ω_{ ΙΙ } with missing genotypes of both parents. Ω_{0+Ι+ΙΙ} includes sample data of all types of Ω_{ 0 }, Ω_{ Ι }, and Ω_{ ΙΙ }. We assume N caseparents trios with N_{0}, N_{I}, N_{II} trios for Ω_{ 0 }, Ω_{ Ι }, and Ω_{ ΙΙ }, respectively, are sampled (N = N_{0}+ N_{I} + N_{II}). Let G_{ O } be the minor allele count carried by the affected offspring. Let {G_{F}, G_{M}} be the minor allele count carried by parents in a caseparents trio. The curly braces indicate set notation rather than ordered pairs. For example, {G_{F}, G_{M}} = {1, 2} means G_{F} = 1, G_{M} = 2 or G_{F} = 2, G_{M} = 1. Let a triplet ({G_{ F }, G_{ M }}, G_{ O }) be a caseparents trio.
Rare variants association analysis
Let x = 2G_{ O } − G_{ F } − G_{ M } be the paired difference in genotypes between the affected offspring and the complement (pseudocontrol). We consider k variants with q causal variants in an interesting region, e.g., a gene region. The variants and caseparents trios are indexed by i and j (i = 1, ⋯, k; j = 1, 2, ⋯, N), respectively. We redefine a paired difference \( {\tilde{x}}_{ij} \) for jth trio at ith variant as flowing,
We can calculate E(x⋅) under the assumption of random mating (thus HardyWeinberg equilibrium) and with the rule of genetic inheritance if ({G_{ F } , G_{ M }}, G_{ O }) ∈ Ω_{ Ι } or Ω_{ ΙΙ }. For example, when ({G_{ F } , G_{ M }}, G_{ O }) ∈ Ω_{ ΙΙ }, P{{G_{ F } , G_{ M }} = {1, 1} G_{ O } = 2} = (1 − MAF)^{2}, P{{G_{ F } , G_{ M }} = {1, 2} G_{ O } = 2} = 2MAF ⋅ (1 − MAF), and P{{G_{ F } , G_{ M }} = {2, 2} G_{ O } = 2} = MAF^{2}, then P{x = 0 G_{ O } = 2} = P{{G_{ F } , G_{ M }} = {2, 2} G_{ O } = 2} = MAF^{2}, P{x = 1 G_{ O } = 2} = P{{G_{ F } , G_{ M }} = {1, 2} G_{ O } = 2} = 2MAF ⋅ (1 − MAF), and P{x = 2 G_{ O } = 2} = P{{G_{ F } , G_{ M }} = {1, 1} G_{ O } = 2} = (1 − MAF)^{2}. Thus
We use the known parental genotypes or the backgroundpopulation of samples to estimate MAF. Other E(x ⋅) can be calculated similar to Eq. (2).
The collapsing method for rare variants can be directly extended to familybased study with the difference vectors in caseparents data. We denote this method as Z_{ c } which can be defined as
where \( U={\mathbf{1}}^T\overline{X} \), 1 is a kdimensional vector 1 = (1, ⋯, 1)^{T}, \( \overline{X}=\frac{1}{N}{\left(\underset{j=1}{\overset{N}{\Sigma}}{x}_{1j},\underset{j=1}{\overset{N}{\Sigma}}{x}_{2j},\cdots, \underset{j=1}{\overset{N}{\Sigma}}{x}_{kj}\right)}^T \), \( {\sigma}_{ij}=\frac{1}{\left(N1\right)}\underset{r,\mathrm{s}=1}{\overset{N}{\Sigma}}\left({x}_{ir}\frac{1}{N}\underset{r=1}{\overset{N}{\Sigma}}{x}_{ir}\right)\left({x}_{js}\frac{1}{N}\underset{s=1}{\overset{N}{\Sigma}}{x}_{js}\right), \) and \( Var(U)=\frac{1}{N}\underset{\mathrm{i},\mathrm{j}=1}{\overset{k}{\Sigma}}{\sigma}_{ij} \). When consider missing parental genotypes, we substitute \( {\tilde{x}}_{ij} \) for x_{ ij } and denote the test statistic by \( {\tilde{Z}}_C \),
In the TDT framework, we let b_{ ij } be the number of the minor allele transmitted from heterozygous parent to the affected offspring at variant i in jth trio and c_{ ij } be the number of the major allele transmitted from heterozygous parent to the affected offspring at variant i in jth trio. Let \( {b}_i=\underset{j}{\Sigma}{b}_{ij} \)be the total number of minoralleletransmitted from heterozygous parents to the affected offspring at ith variant and \( {c}_i=\underset{j}{\Sigma}{c}_{ij} \) is the total number of majoralleletransmitted from heterozygous parents to the affected offspring at variant i. The collapsing method for rare variants in TDT framework (corresponding to TDT_{BRV} in He et al. 2014) is
When consider missing parental genotypes, we define \( {\tilde{c}}_{ij} \) and \( {\tilde{b}}_{ij} \) as following,
In Additional file 1: Table S1 and Additional file 2: Table S2 present all the expectations ofE(x ⋅), E(b ⋅), and E(c ⋅) when ({G_{ F }, G_{ M }}, G_{ O }) ∈ Ω_{ Ι } and ({G_{ F }, G_{ M }}, G_{ O }) ∈ Ω_{ ΙΙ }, respectively. We substitute \( {\tilde{b}}_{ij} \) and \( {\tilde{c}}_{ij} \)for b_{ ij } and c_{ ij }, and denote the test statistic of the TDT_{BRV} method by \( {TDT}_{\mathrm{BRV}} \):
Results
Simulation setting
To assess the performance of our method, we perform a series of simulation studies under a wide range of parameter values. The simulation parameter includes the total number of variants (k), the MAF of each variant, the number (q) and effect size (measured by the odds ratio (OR)) of causal variants, and the sample size (N) for caseparents trios with the number of caseparents trios for Ω_{ 0 }(N_{0}), Ω_{ Ι }(N_{I}), and Ω_{ ΙΙ }(N_{II}). We simulate two populations.
In the first population, 1000 caseparents families are generated and the parameters are chosen as follows: k = 20; q = 0.2 k, 0.4 k, 0.6 k, 0.8 k; MAF ∈ (0.001, 0.01) with uniform distribution for each variant. Under the null hypothesis of no association, we set OR = 1 for all the variants. Under the alternative hypothesis of association, we set OR = 1 for noncausal variants. Under the alternative hypothesis, two scenarios are considered for the effects of causal variants. First, the causal variants have the same positive direction but different effects. Here we set OR∈[1.2, 3] in arithmetic progression. Second, the causal variants have opposite effects. Here we set OR∈ [0.2, 0.9] ∪[1.2, 3] with half of causal variants belonging to [0.2, 0.9] and the other causal variants belonging to [1.2, 3] in arithmetic progression.
In the second population, 500 caseparents families and a number of unaffected individuals are generated (here, 500 unaffected individuals are generated and they are used to estimate MAF when samples come from the second population). The parameter settings are similar to those in the first population except that the OR of causal variants under the alternative hypothesis. We let the OR of each causal variant in the second population be 0.1 less than that in the first population.
Once the parameter values are chosen, we first generate parental haplotypes based on a latent variable Z = (Z_{1}, ⋯, Z_{ k }) from a multivariate normal distribution with marginal standard normal and covariance structure as described below [16, 17]: if variants i and j are both causal or both noncausal, then the correlation is set to be Corr (Z_{ i }, Z_{ j }) =0.4^{∣i − j∣}; otherwise the correlation is zero. We transform Z_{ i } to 0 (major allele) or 1 (minor allele) according to the MAF of the ith variant and combine two haplotypes to obtain the parental haplotypes [16, 17]. Offspring haplotypes are generated from the parental haplotype assuming no recombination between the variants. The disease status for an offspring’s phenotype is determined by the following logistic model [18]:
where OR_{ i } is the odds ratio of ith variant, G_{ Oij } is the minor allele count carried by the affected offspring in the jth trio at the ith variant, and c is the background prevalence of being affected for a subject with no minor alleles. Here, we let c = 0.01 in the first population and c = 0.008 in the second population.
The 1000 caseparents trios in the first population are composed of three types of trios: 500 (N_{0}) forΩ_{ 0 }, 250 for Ω_{ Ι } by randomly discarding one set of parental genotypes, and 250 for Ω_{ ΙΙ } by discarding both parental genotypes. There are two types of trios in the 500 caseparents trios in the second population: 250 for Ω_{ Ι } and 250 for Ω_{ ΙΙ }. In our analysis, we fix N_{0} (=500) and change N_{I} and N_{II}. We let N_{I} and N_{II} take the value of \( \frac{1}{10} \)N_{0}, \( \frac{1}{5} \)N_{0}, and \( \frac{1}{2} \)N_{0}. We calculate Z_{c} and TDT_{BRV} in Ω_{ 0 } and \( {\tilde{Z}}_C \) and \( {TDT}_{\mathrm{BRV}} \) in Ω_{0 + Ι},Ω_{0 + ΙΙ}, and Ω_{0+Ι+ΙΙ}. The pvalue of statistical tests is estimated by a permutation procedure as follows: First calculate the databased statistic, then recalculate permutationbased statistic by randomly changing signs (positive or negative) of x_{ ij } for \( {\tilde{Z}}_C \) and permuting the “transmitted” and “not transmitted” labels randomly for \( {TDT}_{\mathrm{BRV}} \) with equal probability. We repeat this process 1000 times and pvalue is estimated as the proportion of permutationbased statistics that are larger than the databased statistic. For a given significance level α, the power/type I error rate is estimated as the proportion of rejecting the null hypothesis when pvalue ≤α with 1000 replicates.
Type I error rates and power
We investigate the performance of our method in a homogeneous population and in populations with population stratification. For the homogeneous population, all samples come from the first population. For the population stratification, caseparents trios with missing parental genotypes come from the second population. We present in Table 1 the type I error rates when α = 0.05, 0.001. As shown in Table 1, for three situations of Ω_{0 + Ι}, Ω_{0 + ΙΙ}, and Ω_{0+Ι+ΙΙ}, the type I error rates are wellcontrolled around the nominal levels. This indicates the validity of the method when considering missing one parental genotypes or case only even in population stratification.
We present in Tables 2, 3 and 4 the power of \( {\tilde{Z}}_C \) and\( {TDT}_{\mathrm{BRV}} \) in the homogeneous population for three situations of Ω_{0 + Ι}, Ω_{0 + ΙΙ}, and Ω_{0+Ι+ΙΙ}, respectively, when causal variants have the same positive direction but different effects or causal variants have opposite effects. We can see from Table 1 that, when causal variants have different effects with the same direction and the proportion of noncausal variants is 80% or 60%, adding caseparents trios of Ω_{ Ι } to complete caseparents data set can increase the power of \( {\tilde{Z}}_C \) and \( {TDT}_{\mathrm{BRV}} \) for rare variants association analysis. For example, when there are 80% noncausal variants, adding \( \frac{1}{10} \)N_{0} (50), \( \frac{1}{5} \)N_{0} (100), and \( \frac{1}{2} \)N_{0}(250) caseparents trios of Ω_{ Ι } to 500 complete caseparents trios improves the powers of \( {\tilde{Z}}_C \)and\( {TDT}_{\mathrm{BRV}} \) from 0.408 and 0.602 to 0.566 and 0.712, to 0.674 and 0.748, and to 0.752 and 0.784, respectively. We observed that, although the power of \( {\tilde{Z}}_C \) is lower than that of \( {TDT}_{\mathrm{BRV}} \)with the use of complete caseparents data, adding \( \frac{1}{2} \)N_{0} caseparents trios of Ω_{ Ι } to complete caseparents trios helps \( {\tilde{Z}}_C \) achieve similar power as that of \( {TDT}_{\mathrm{BRV}} \). We also noted that, when the number of noncausal variants is small (40% or 20%), since the two statistics have high power just by using 500 complete caseparents trios, adding caseparents trios of Ω_{ Ι } does not help to improve power. As we decrease the sample size to 200, adding caseparents trios of Ω_{ Ι } can still improve power of \( {\tilde{Z}}_C \) and \( {TDT}_{\mathrm{BRV}} \)(data not shown). When causal variants have opposite effects, we also observed that adding caseparents trios of Ω_{ Ι } can improve the statistical power.
In order to further show the magnitude of power improvement of \( {\tilde{Z}}_C \) and \( {TDT}_{\mathrm{BRV}} \), we present in parentheses in Tables 2, 3 and 4 the proportion of power improved by adding caseparents trios of Ω_{ Ι }, Ω_{ ΙΙ }, and Ω_{Ι+ΙΙ}to complete caseparents data set Ω_{ 0 }. It can be found from Table 2 that the proportion of power improvement drops with a decrease in the number of noncausal variants, and the proportion of power improvement for \( {\tilde{Z}}_C \) is higher than that for \( {TDT}_{\mathrm{BRV}} \). When causal variants have opposite effects, we observed that the proportion of power improvement is larger than that when causal variants have the same direction. As the proportion of noncausal variants decreases from 80% to 60%, the proportion of power improvement increases. For example, while the powers of \( {TDT}_{\mathrm{BRV}} \) and \( {\tilde{Z}}_C \) for 80% noncausal variants have improved by 20.4% to 38.1% and 62.9% to 117% with the number of caseparents trios of Ω_{ Ι } increasing from \( \frac{1}{10} \)N_{0} to \( \frac{1}{2} \)N_{0}, respectively, the powers of \( {TDT}_{\mathrm{BRV}} \) and \( {\tilde{Z}}_C \) for 60% noncausal variants have improved by 25.8 to 61.3% and 89.0 to 239%, respectively. However, the proportion of power improvement drops with a further decrease in the number of noncausal variants. For example, with the number of caseparents trios of Ω_{ Ι } increasing from \( \frac{1}{10} \)N_{0}to \( \frac{1}{2} \)N_{0}, the proportions of power improvement of \( {TDT}_{\mathrm{BRV}} \) change from7.58 to 36.4% for 40% noncausal variants and from 5.76 to 35.2% for 20% noncausal variants, respectively, and the proportions of power improvement of \( {\tilde{Z}}_C \) change from 34.0 to 96.6% for 40% noncausal variants and from 31.3 to 98.7% for 20% noncausal variants, respectively. This result indicates that, when causal variants have opposite effects, the proportion of power improvement increases early then decreases later with the increase in the number of noncausal variants. Adding caseparents trios of Ω_{ Ι } with the medium number of noncausal variants is best for power improvement. For Ω_{0 + ΙΙ}and Ω_{0+Ι+ΙΙ}, Tables 3 and 4 show similar results as those for Ω_{0 + Ι}. In addition, we observed that the proportion of power improvement for Ω_{0 + Ι} is the largest among three situations of Ω_{0 + Ι},Ω_{0 + ΙΙ}, and Ω_{0+Ι+ΙΙ}.
When there is population stratification, Additional file 3: Figure S1S4 shows the power of \( {\tilde{Z}}_C \) and \( {TDT}_{\mathrm{BRV}} \)against the sample size for various proportions of noncausal variants under three situations of Ω_{0 + Ι},Ω_{0 + ΙΙ}, and Ω_{0+Ι+ΙΙ}, respectively. The results are similar to those in the homogeneous population. We also consider a general situation for population stratification: Samples from two populations both consist of caseparents trios of three types, Ω_{ 0 }, Ω_{ Ι }, and Ω_{ ΙΙ }. The simulation results are similar to those in Additional file 3: Figures S1S4 (data not shown). These results indicate that, when adding caseparents trios with missing parental genotypes or even case only to complete a caseparents data set, population stratification does not affect the power of these two statistics for rare variant association analysis.
Discussion
In this report, we considered caseparents data with missing parental genotypes for rare variant association analysis in caseparents studies. Based on the collapsing method with the difference vector and TDT framework, we presented two statistics, \( {\tilde{Z}}_C \) and \( {TDT}_{\mathrm{BRV}} \), allowing for missing parental genotypes. The key in the proposed approach is to estimate the MAF. Actually, in clinical research, experimental design is usually done for a homogenous population or several specific populations. One can use the known parental genotypes or the backgroundpopulation of samples to estimate MAF. We investigated the performance of these two statistics in three different situations: complete caseparents data mixed with one parental genotype missing, complete caseparents data mixed with both parental genotypes missing, and complete caseparents data mixed with one and both parental genotypes missing. Through simulation studies, we found that adding caseparents data with missing parental genotypes to complete caseparents data set can greatly improve the power of these two statistics, though the proportion of power improvement varied. Additionally, our strategy is not affected by population stratification.
In most studies of disease associations with rare variants, family and populationbased samples were used separately [6, 7, 11, 19, 20]. Although familybased studies have several advantages over populationbased studies in rare variant association analysis, it is often difficult to recruit sufficiently large familybased samples, especially for rare diseases. More often, information about parents is incomplete, and this poses some challenges in analysis. Discarding those families with missing parental genotypes will further reduce the sample size and result in a loss of statistical power. In our strategy, caseparents trios missing one or both parental genotypes are kept in analysis and thus can help to greatly improve statistical power. Furthermore, we can see that missing both parental genotypes corresponds to case only. This means we can use unrelated affected individuals in caseparents studies, which is useful for caseparents studies with small sample size. Although population stratification might exist in these unrelated affected individuals recruited from populationbased samples, our strategy is not affected by population stratification. Our simulation results showed that combining unrelated affected individuals with complete caseparents data could increase power by 5 ~ 60% for \( {TDT}_{\mathrm{BRV}} \)and 20 ~ 200% for\( {\tilde{Z}}_C \) in both homogenous populations and populations with population stratification.
In addition to allowing for missing parental genotypes, our method can be used to address another problem when there are missing genotypes for individual variants in parental data. In fact, when individual variants are analyzed and there are missing genotypes for some variants, removing those samples for variants with missing genotypes will result in inconsistency of the sample size. With the strategy described above, our method can overcome this problem. However, our strategy is not suitable for caseparents trios with missing offspring genotypes, so further study is needed to address such scenarios.
Conclusions
The proposed strategy allowing for missing parental genotypes, or even adding unrelated affected individuals, can greatly improve the statistical power for rare variantdisease association and meanwhile is not affected by population stratification.
Abbreviations
 BRV:

Burden of rare variants
 CMC:

Combined multivariate and collapsing
 LD:

Linkage disequilibrium
 MAF:

Minor allele frequency
 OR:

Odds ratio
 TDT:

Transmission/disequilibrium test
 VT:

Variable threshold
 WSS:

Weighted sum statistic
References
 1.
Ott J, Kamatani Y, Lathrop M. Familybased designs for genomewide association studies. Nat Rev Genet. 2011;12(7):465–74.
 2.
Mathieson I, McVean G. Differential confounding of rare and common variants in spatially structured populations. Nat Genet. 2012;44(3):243–6.
 3.
Liu J, Lewinger JP, Gilliland FD, Gauderman WJ, Conti DV. Confounding and heterogeneity in genetic association studies with admixed populations. Am J Epidemiol. 2013;177(4):351–60.
 4.
He Z, Zhang D, Renton AE, Li B, Zhao L, Wang GT, Goate AM, Mayeux R, Leal SM. The rarevariant generalized disequilibrium test for association analysis of nuclear and extended pedigrees with application to alzheimer disease WGS data. Am J Hum Genet. 2017;100(2):193–204.
 5.
Shi M, Umbach DM, Weinberg CR. Identification of riskrelated haplotypes with the use of multiple SNPs from nuclear families. Am J Hum Genet. 2007;81(1):53–66.
 6.
Li B, Leal SM. Methods for detecting association with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83(3):311–21.
 7.
Lin DY, Tang ZZ. A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet. 2011;89(3):354–67.
 8.
Madsen BE, Browning SR. A groupwise association test for rare mutations using a weighter sum statistic. PLoS Genet. 2009;5(2):e1000384.
 9.
Price AL, Kryukov GV, Bakker PIW, Purcell SM, Staples J, Wei LJ, Sunyaev SR. Pooled association tests for rare variants in exonresequencing studies. Am J Hum Genet. 2010;86(6):832–8.
 10.
Spielman RS, McGinnis RE, Ewens WJ. Transmission test for linkage disequilibrium: the insulin gene region and insulindependent diabetes mellitus (IDDM). Am J Hum Genet. 1993;52(3):506–16.
 11.
He Z, O’Roak BJ, Smith JD, Wang G, Hooker S, SantosCortez RLP, Li B, Kan M, Krumm N, Nickerson DA, Shendure J, Eichler EE, Leal SM. Rarevariant extensions of the transmission disequilibrium test: application to autism exome sequence data. Am J Hum Genet. 2014;94(1):33–46.
 12.
McIntyre LM, Martin ER, Simonsen KL, Kaplan NL. Circumventing multiple testing: a multilocus Monte Carlo approach to testing for association. Genet Epidemiol. 2000;19(1):18–29.
 13.
Li YM, Xiang Y. Detecting disease association with rare variants in case parents studies. J Hum Genet. 2017;62(5):549–52.
 14.
Allen AS, Rathouz PJ, Satten GA. Informative missingness in genetic association studies: caseparent designs. Am J Hum Genet. 2003;72(3):671–80.
 15.
Sebastiani P, Abad MM, Alpargu G, Ramoni MF. Robust transmission/ disequilibrium test for incomplete family genotypes. Genetics. 2004;168(4):2329–37.
 16.
Basu S, Pan W. Comparison of statistical tests for disease association with rare variants. Genet Epidemiol. 2011;35(7):606–19.
 17.
Sun L, Wang C, Hu YQ. Utilizing mutual information for detecting rare and common variants associated with a categorical trait. Peer J. 2016;4:e2139.
 18.
Preston MD, Dudbridge F. Utilising familybased designs for detecting rare variant disease associations. Ann Hum Genet. 2014;78(2):129–40.
 19.
MC W, Lee S, Cai T, Li Y, Boehnke M, Lin X. Rarevariant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet. 2011;89(1):82–93.
 20.
IonitaLaza I, Lee S, Makarov V, Buxbaum JD, Lin X. Familybased association tests for sequence data, and comparisons with populationbased association tests. Eur J Hum Genet. 2013;21(10):1158–62.
Acknowledgments
LYM was partially supported by National Natural Science Foundation of China (11301206), Scientific Research Fund of Hunan Provincial Education Department (16A166), Hunan Provincial Natural Science Foundation of China (2017JJ2212), and China Scholarship Council (National cooperation fund of Hunan Province). HWD was partially supported by grants from the National Institutes of Health (R01AR057049, R01AR059781, D43TW009107, P20 GM109036, R01MH107354, R01MH104680, R01GM109068, R01AR069055), the Edward G. Schlieder Endowment fund to Tulane University. The authors would like to appreciate the assistance of Loula Burton, Office of Research in Tulane University, in editing the manuscript.
Funding
This work was financially supported by the funding sponsors of National Natural Science Foundation of China (11301206),Scientific Research Fund of Hunan Provincial Education Department (16A166), and Hunan Provincial Natural Science Foundation of China (2017JJ2212).
Availability of data and materials
All data generated or analysed during this study are included in this published article.
Author information
Affiliations
Contributions
LYM conceived the idea, designed the study, and wrote the manuscript. XY developed the statistical method. XC, ShH, and DHW revised the manuscript. All authors have read and approved the final version of the manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
This study has not directly involved humans, animals or plants. So no consent to participate was required.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional files
Additional file 1: Table S1.
All the expectations of E(x),E(b),and E(c) when (G_{F}, G_{M} ,G_{O})∈Ω_{I}. (PDF 109 kb)
Additional file 2: Table S2.
All the expectations of E(x),E(b),and E(c) when (G_{F}, G_{M} ,G_{O})∈Ω_{II}. (PDF 88 kb)
Additional file 3: Figure S1.
Empirical power against the sample size at the 0.05 significance level in population stratification when there are 20% noncausal variants. Note: A and B are for\( {\tilde{Z}}_C \), and C and D are for \( {TDT}_{\mathrm{BRV}} \)when causal variants have different effects with the same direction and causal variants have opposite effects, respectively. The sample size N=N_{0}, N_{0} + 1/10 N_{0}, N_{0} + 1/5 N_{0}, N_{0} + 1/2 N_{0} with N_{0} = 500 denoted by 0, 1/10, 1/5, and 1/2 respectively. Ω_{0 + I} (○), Ω_{0 + II} (*), Ω_{0 + I + II} (+). Figure S2. Empirical power against the sample size at the 0.05 significance level in population stratification when there are 40% noncausal variants. Note: A and B are for\( {\tilde{Z}}_C \), and C and D are for \( {TDT}_{\mathrm{BRV}} \) when causal variants have different effects with the same direction and causal variants have opposite effects, respectively. The sample size N=N_{0}, N_{0} + 1/10 N_{0}, N_{0} + 1/5 N_{0}, N_{0} + 1/2 N_{0} with N_{0} = 500 denoted by 0, 1/10, 1/5, and 1/2 respectively. Ω_{0 + I} (○), Ω_{0 + II} (*), Ω_{0 + I + II} (+). Figure S3. Empirical power against the sample size at the 0.05 significance level in population stratification when there are 60% noncausal variants. Note: A and B are for \( {\tilde{Z}}_C \), and C and D are for \( {TDT}_{\mathrm{BRV}} \)when causal variants have different effects with the same direction and causal variants have opposite effects, respectively. The sample size N=N_{0}, N_{0} + 1/10 N_{0}, N_{0} + 1/5 N_{0}, N_{0} + 1/2 N_{0} with N_{0} = 500 denoted by 0, 1/10, 1/5, and 1/2 respectively. Ω_{0 + I} (○), Ω_{0 + II} (*), Ω_{0 + I + II} (+). Figure S4. Empirical power against the sample size at the 0.05 significance level in population stratification when there are 80% noncausal variants. Note: A and B are for \( {\tilde{Z}}_C \), and C and D are for \( {TDT}_{\mathrm{BRV}} \) when causal variants have different effects with the same direction and causal variants have opposite effects, respectively. The sample size N=N_{0}, N_{0} + 1/10 N_{0}, N_{0} + 1/5 N_{0}, N_{0} + 1/2 N_{0} with N_{0} = 500 denoted by 0, 1/10, 1/5, and 1/2 respectively. Ω_{0 + I} (○), Ω_{0 + II} (*), Ω_{0 + I + II} (+). (PDF 89 kb)
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Li, Y., Xiang, Y., Xu, C. et al. Rare variant association analysis in caseparents studies by allowing for missing parental genotypes. BMC Genet 19, 7 (2018). https://doi.org/10.1186/s1286301805978
Received:
Accepted:
Published:
Keywords
 Rarevariant association analysis
 Caseparent trios
 Collapsing method