Volume 6 Supplement 1
Genetic Analysis Workshop 14: Microsatellite and singlenucleotide polymorphism
Modeling the effect of a genetic factor for a complex trait in a simulated population
 Mathieu Bourgey†^{1}Email author,
 AnneLouise Leutenegger†^{2},
 Emmanuelle Cousin^{3},
 Catherine Bourgain^{1},
 MarieClaude Babron^{1} and
 Françoise ClergetDarpoux^{1}
DOI: 10.1186/147121566S1S87
© Bourgey et al; licensee BioMed Central Ltd 2005
Published: 30 December 2005
Abstract
Genetic Analysis Workshop 14 simulated data have been analyzed with MASC(marker association segregation chisquares) in which we implemented a bootstrap procedure to provide the variation intervals of parameter estimates. We model here the effect of a genetic factor, S, for Kofendrerd Personality Disorder in the region of the marker C03R0281 for the Aipotu population. The goodness of fit of several genetic models with two alleles for one locus has been tested. The data are not compatible with a direct effect of a singlenucleotide polymorphism (SNP) (SNP 16, 17, 18, 19 of pack 153) in the region. Therefore, we can conclude that the functional polymorphism has not been typed and is in linkage disequilibrium with the four studied SNPs. We obtained very large variation intervals both of the disease allele frequency and the degree of dominance. The uncertainty of the model parameters can be explained first, by the method used, which models marginal effects when the disease is due to complex interactions, second, by the presence of different subcriteria used for the diagnosis that are not determined by S in the same way, and third, by the fact that the segregation of the disease in the families was not taken into account. However, we could not find any model that could explain the familial segregation of the trait, namely the higher proportion of affected parents than affected sibs.
Background
The aim of this work is to study and model the marginal effect of one susceptibility locus involved in the determinism of Kofendrerd Personality Disorder (KPD) in the Aipotu population. The presence of a susceptibility locus closely linked to the marker C03R0281 was shown by the existence of both strong association and genetic linkage between this marker and the trait [1, 2]. Before modeling the marginal effect of this factor, we searched for the replicate that best represented this effect. The modeling (estimation of the allele frequency and marginal penetrances) is carried out through the MASC method [3], using the information provided by the marker C03R0281 denoted M hereafter. The variation intervals for the parameter estimates are obtained through a bootstrap procedure we incorporated in the MASC (marker association segregation chisquares) program.
Methods
Selection of the best replicate
We want to select the replicates that best represent the distributions in the region of marker M (C03R0281) in terms of both association and linkage. Estimating the parameters and their variation intervals in this sample is then equivalent to evaluating them in the whole set of replicates. In the pooled sample set (10,000 families), we consider one index (an affected case) by family and his genotype for the marker M. For each index, we also consider his identitybydescent (IBD) sharing for M with one random affected sib (families of the Aipotu population have been selected as having at least two sibs affected with KPD). In order to have a reliable IBD sharing for each sib pair, we ordered one SNP packet (153) surrounding marker M. The IBD sharing is obtained by maximum likelihood estimation using the information provided by the whole set of markers in the M region on chromosome 3.
Because we are looking for the model that best explains these four independent distributions (one genotype distribution and three stratified IBD distributions), we first determined the replicates that best reflect these distributions. We computed the distance of each of the 100 replicates to the pooled sample by a chisquare statistics, equal to the sum of the four independent chisquares obtained by comparing the distributions observed in the replicate and in the pooled sample.
Modeling the genetic effect
We modeled the effect of the susceptibility factor by the MASC method. For a given genetic model, the MASC method computes the four expected distributions described in Figure 1 and the previously described chisquare statistics (the sum of four chisquares between the observed and expected distributions). The chisquare is minimized over the parameters left free to vary. The fit of the model to the observed data is then tested (8 df minus the number of parameters free to vary). The parameters of the genetic model are the penetrances of each genotype and the coupling between the marker alleles and the susceptibility factor alleles. The expected distributions are computed conditionally to the fact that the index cases have at least one affected sib. They depend on the frequency of the marker alleles in the general population. The marker allele frequencies may be assumed to be already known (situation 1), to be obtained through a control sample (situation 2), or to be obtained through the parental alleles which have not been transmitted to the affected cases used for the family ascertainment (situation 3).
Computing the intervals of variation for the parameter estimates
We implemented a bootstrap procedure for calculating the variation intervals of the parameter estimates in the MASC program. The uncertainty on the parameters is due to the sampling of families and of controls, when the marker allele frequencies are inferred from a control sample. For each bootstrapped family set (1,000 replicates), we estimated the parameters considering the three possibilities described above for the marker allele frequencies. In situation 1, there is no uncertainty induced by the marker allele frequencies. In situation 2, the bootstrap procedure is applied to both the family sample and a sample of 50 controls randomly drawn among the 100 control samples. In situation 3, the bootstrap is only applied to the family sample. For the three situations, we obtain the distribution of the parameter estimates and provide the 95% intervals.
Results
The best replicate
The 10 best replicates. Chisquares between the distributions observed in the replicate and in the pooled sample.
Rank  Replicate number  χ^{2} value 

1  97  0.53 
2  63  1.12 
3  19  1.15 
4  48  1.26 
5  56  1.38 
6  4  1.39 
7  31  1.77 
8  55  1.90 
9  88  1.91 
10  5  1.93 
Modeling of the genetic effects
We tested a biallelic susceptibility locus model (S1, S2) and estimate four parameters: 2 relative penetrances, λ1 and λ2, and 2 coupling probabilities, c_{11}and c_{12}, where
λ1 = P(affected  S1S2) / P(affected  S1S1),
λ2 = P(affected  S2S2) / P(affected  S1S1),
c_{11} = P(S1  M1),
c_{12} = P(S1  M2).
The frequency (q) of allele S1 at the susceptibility locus can be written as
q = P(S1) = c_{11} P(M1) + c_{12} P(M2).
The direct effect of marker M, which means S1 is confounded with M1, and S2 with M2, was rejected (χ^{2} = 14.14; 6 df).
Models compatible with observations made on M in replicate 97
λ1  λ2  Q  χ^{2} (df)  

General  0.02  0.001  0.002  1.852 (4 df) 
Dominant  1  0  0.001  5.468 (6 df) 
Recessive  0  0  0.249  4.005 (6 df) 
Frequencies of allele 1 for SNPs 16, 17, 18, and 19 for the index cases and the controls
SNP  Index cases (n = 10,000)  Controls (n = 5,000)  χ^{2} (1 df) 

16  0.70  0.50  580.84 
17  0.40  0.27  258.40 
18  0.67  0.54  225.47 
19 (marker M)  0.62  0.55  73.61 
Computing the variation intervals of the parameter estimates
Variation interval of the disease allele frequency q
95% Variation interval [range]^{a}  

No. Families  No uncertainty  Uncertainty 
100  0.24 [0.01; 0.74]  0.24 [0.01; 0.74] 
200  0.22 [0.01; 0.54]  0.22 [0.01; 0.55] 
500  0.20 [0.07; 0.35]  0.20 [0.01; 0.35] 
Discussion
Before knowing the simulation model
Phenotype distribution for the Aipotu population
A  B  C  

a  b  c  d  e  f  g  h  i  J  k  l  
Aff %  0.16  0.67  0.63  0.63  1  1  0.63  1  0.15  0.15  0.44  0.13 
Unaff %  0.02  0.04  0.1  0.1  0.1  0.09  0.1  0.1  0.15  0.15  0.09  0.05 
IBD = 0  0.19  0.17  0.19  0.2  0.16  0.16  0.2  0.16  0.24  0.25  0.1  0.25 
IBD = 1  0.51  0.49  0.47  0.47  0.46  0.46  0.46  0.46  0.49  0.49  0.49  0.48 
IBD = 2  0.30  0.34  0.34  0.32  0.38  0.38  0.34  0.38  0.27  0.25  0.41  0.27 
Proportion of affected parents and sibs
Proportion of affected subjects (n/total)  

Replicate  Parents  Sibs* 
All  0.2 (4043/20000)  0.1 (2882/28174) 
Best replicate (97)  0.2 (40/200)  0.06 (17/281) 
After knowing the simulation model
To validate our bootstrap procedure, we looked to see if the true parameters used for the simulation were included in our variation intervals. The value of the disease allele frequency used in the simulation is 0.15. This value is included in the variation intervals for the three sample sizes we used (100, 200, and 500 families). The larger the sample size, the closer the estimate to the true value.
Note that the true value of the dominance parameter cannot be inferred from the provided answers without extensive work. Indeed, the KPD phenotype is a mixture of different phenotypes, each one corresponding to different models of interaction between D2 and another susceptibility locus. Because there is no generation effect in the simulation, we still cannot explain the greater risk for parents than for sibs.
Notes
Abbreviations
 GAW14:

Genetic Analysis Workshop 14
 IBD:

Identity by descent
 KPD:

Kofendrerd Personality Disorder
 LD:

Linkage disequilibrium
 MASC:

Marker association segregation chisquares
 SNP:

Singlenucleotide polymorphism
Declarations
Acknowledgements
All authors read and approved the final manuscript.
Authors’ Affiliations
References
 Bourgain C: Comparing strategies for association mapping in samples with related individuals. BMC Genet. 2005, 6 (Suppl 1): S9810.1186/147121566S1S98.PubMed CentralView ArticlePubMedGoogle Scholar
 Babron MC, Bourgain C, Leutenegger AL, ClergetDarpoux F: Detection of susceptibility loci by genomewide linkage analysis. BMC Genet. 2005, 6 (Suppl 1): S1810.1186/147121566S1S18.PubMed CentralView ArticlePubMedGoogle Scholar
 ClergetDarpoux F, Babron MC, Prum B, Lathrop GM, Deschamps I, Hors J: A new method to test genetic models in HLA associated diseases: the MASC method. Ann Hum Genet. 1988, 52: 247258.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.