Volume 6 Supplement 1
Modeling the effect of a genetic factor for a complex trait in a simulated population
- Mathieu Bourgey†1Email author,
- Anne-Louise Leutenegger†2,
- Emmanuelle Cousin3,
- Catherine Bourgain1,
- Marie-Claude Babron1 and
- Françoise Clerget-Darpoux1
© Bourgey et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Genetic Analysis Workshop 14 simulated data have been analyzed with MASC(marker association segregation chi-squares) in which we implemented a bootstrap procedure to provide the variation intervals of parameter estimates. We model here the effect of a genetic factor, S, for Kofendrerd Personality Disorder in the region of the marker C03R0281 for the Aipotu population. The goodness of fit of several genetic models with two alleles for one locus has been tested. The data are not compatible with a direct effect of a single-nucleotide polymorphism (SNP) (SNP 16, 17, 18, 19 of pack 153) in the region. Therefore, we can conclude that the functional polymorphism has not been typed and is in linkage disequilibrium with the four studied SNPs. We obtained very large variation intervals both of the disease allele frequency and the degree of dominance. The uncertainty of the model parameters can be explained first, by the method used, which models marginal effects when the disease is due to complex interactions, second, by the presence of different sub-criteria used for the diagnosis that are not determined by S in the same way, and third, by the fact that the segregation of the disease in the families was not taken into account. However, we could not find any model that could explain the familial segregation of the trait, namely the higher proportion of affected parents than affected sibs.
The aim of this work is to study and model the marginal effect of one susceptibility locus involved in the determinism of Kofendrerd Personality Disorder (KPD) in the Aipotu population. The presence of a susceptibility locus closely linked to the marker C03R0281 was shown by the existence of both strong association and genetic linkage between this marker and the trait [1, 2]. Before modeling the marginal effect of this factor, we searched for the replicate that best represented this effect. The modeling (estimation of the allele frequency and marginal penetrances) is carried out through the MASC method , using the information provided by the marker C03R0281 denoted M hereafter. The variation intervals for the parameter estimates are obtained through a bootstrap procedure we incorporated in the MASC (marker association segregation chi-squares) program.
Selection of the best replicate
We want to select the replicates that best represent the distributions in the region of marker M (C03R0281) in terms of both association and linkage. Estimating the parameters and their variation intervals in this sample is then equivalent to evaluating them in the whole set of replicates. In the pooled sample set (10,000 families), we consider one index (an affected case) by family and his genotype for the marker M. For each index, we also consider his identity-by-descent (IBD) sharing for M with one random affected sib (families of the Aipotu population have been selected as having at least two sibs affected with KPD). In order to have a reliable IBD sharing for each sib pair, we ordered one SNP packet (153) surrounding marker M. The IBD sharing is obtained by maximum likelihood estimation using the information provided by the whole set of markers in the M region on chromosome 3.
Because we are looking for the model that best explains these four independent distributions (one genotype distribution and three stratified IBD distributions), we first determined the replicates that best reflect these distributions. We computed the distance of each of the 100 replicates to the pooled sample by a chi-square statistics, equal to the sum of the four independent chi-squares obtained by comparing the distributions observed in the replicate and in the pooled sample.
Modeling the genetic effect
We modeled the effect of the susceptibility factor by the MASC method. For a given genetic model, the MASC method computes the four expected distributions described in Figure 1 and the previously described chi-square statistics (the sum of four chi-squares between the observed and expected distributions). The chi-square is minimized over the parameters left free to vary. The fit of the model to the observed data is then tested (8 df minus the number of parameters free to vary). The parameters of the genetic model are the penetrances of each genotype and the coupling between the marker alleles and the susceptibility factor alleles. The expected distributions are computed conditionally to the fact that the index cases have at least one affected sib. They depend on the frequency of the marker alleles in the general population. The marker allele frequencies may be assumed to be already known (situation 1), to be obtained through a control sample (situation 2), or to be obtained through the parental alleles which have not been transmitted to the affected cases used for the family ascertainment (situation 3).
Computing the intervals of variation for the parameter estimates
We implemented a bootstrap procedure for calculating the variation intervals of the parameter estimates in the MASC program. The uncertainty on the parameters is due to the sampling of families and of controls, when the marker allele frequencies are inferred from a control sample. For each bootstrapped family set (1,000 replicates), we estimated the parameters considering the three possibilities described above for the marker allele frequencies. In situation 1, there is no uncertainty induced by the marker allele frequencies. In situation 2, the bootstrap procedure is applied to both the family sample and a sample of 50 controls randomly drawn among the 100 control samples. In situation 3, the bootstrap is only applied to the family sample. For the three situations, we obtain the distribution of the parameter estimates and provide the 95% intervals.
The best replicate
The 10 best replicates. Chi-squares between the distributions observed in the replicate and in the pooled sample.
Modeling of the genetic effects
We tested a biallelic susceptibility locus model (S1, S2) and estimate four parameters: 2 relative penetrances, λ1 and λ2, and 2 coupling probabilities, c11and c12, where
λ1 = P(affected | S1S2) / P(affected | S1S1),
λ2 = P(affected | S2S2) / P(affected | S1S1),
c11 = P(S1 | M1),
c12 = P(S1 | M2).
The frequency (q) of allele S1 at the susceptibility locus can be written as
q = P(S1) = c11 P(M1) + c12 P(M2).
The direct effect of marker M, which means S1 is confounded with M1, and S2 with M2, was rejected (χ2 = 14.14; 6 df).
Models compatible with observations made on M in replicate 97
1.852 (4 df)
5.468 (6 df)
4.005 (6 df)
Frequencies of allele 1 for SNPs 16, 17, 18, and 19 for the index cases and the controls
Index cases (n = 10,000)
Controls (n = 5,000)
χ2 (1 df)
19 (marker M)
Computing the variation intervals of the parameter estimates
Variation interval of the disease allele frequency q
95% Variation interval [range]a
0.24 [0.01; 0.74]
0.24 [0.01; 0.74]
0.22 [0.01; 0.54]
0.22 [0.01; 0.55]
0.20 [0.07; 0.35]
0.20 [0.01; 0.35]
Before knowing the simulation model
Phenotype distribution for the Aipotu population
IBD = 0
IBD = 1
IBD = 2
Proportion of affected parents and sibs
Proportion of affected subjects (n/total)
Best replicate (97)
After knowing the simulation model
To validate our bootstrap procedure, we looked to see if the true parameters used for the simulation were included in our variation intervals. The value of the disease allele frequency used in the simulation is 0.15. This value is included in the variation intervals for the three sample sizes we used (100, 200, and 500 families). The larger the sample size, the closer the estimate to the true value.
Note that the true value of the dominance parameter cannot be inferred from the provided answers without extensive work. Indeed, the KPD phenotype is a mixture of different phenotypes, each one corresponding to different models of interaction between D2 and another susceptibility locus. Because there is no generation effect in the simulation, we still cannot explain the greater risk for parents than for sibs.
Genetic Analysis Workshop 14
Identity by descent
Kofendrerd Personality Disorder
Marker association segregation chi-squares
All authors read and approved the final manuscript.
- Bourgain C: Comparing strategies for association mapping in samples with related individuals. BMC Genet. 2005, 6 (Suppl 1): S98-10.1186/1471-2156-6-S1-S98.PubMed CentralView ArticlePubMedGoogle Scholar
- Babron M-C, Bourgain C, Leutenegger A-L, Clerget-Darpoux F: Detection of susceptibility loci by genome-wide linkage analysis. BMC Genet. 2005, 6 (Suppl 1): S18-10.1186/1471-2156-6-S1-S18.PubMed CentralView ArticlePubMedGoogle Scholar
- Clerget-Darpoux F, Babron MC, Prum B, Lathrop GM, Deschamps I, Hors J: A new method to test genetic models in HLA associated diseases: the MASC method. Ann Hum Genet. 1988, 52: 247-258.View ArticlePubMedGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.