The MESA study was designed to investigate the determinants and progression of subclinical cardiovascular diseases in 4 ethnic groups enrolled from six geographic regions in the United States [29]. Institutional Review Boards at each of the six MESA centers where participants were seen for clinical exams reviewed and approved the conduct of the MESA study including genetics research. The sample of 6,814 individuals was drawn such that it contains roughly equal proportions of men and women, all of whom were free of clinically recognized cardiovascular disease at enrollment. Race was self-reported, about 23% of the subjects enrolled in the study self-identified as AA, 11% as CA, 38% as EA and 28% as HA. An intensive evaluation was conducted at baseline, where information regarding height, weight, waist circumference, smoking history, alcohol intake, education level, physical activity, medication, hypertension, heart rate, diabetes, and cholesterol level were collected, among other variables. Data on LV mass and LV ejection fraction were obtained via magnetic resonance imaging (MRI) from MESA participants who consented to the cardiac MRI scan. The LV ejection fraction was calculated by dividing the individual stroke volume by the end-diastolic volume. LV mass was adjusted for body surface area calculated using the formula provided by Wang et al. [21, 30]. As will be seen in the results section, these two measures are differently distributed in the four ethnic groups. We will use these ethnic differences in our simulations to create confounding situations that we will seek to control for with three measures of individual ancestry that we describe below.

As part of a MESA genotyping project, ninety-six AIMs were initially genotyped on more than 2,848 participants. These AIMs were selected from an Illumina proprietary SNP database, and were selected to maximize the difference in allele frequencies between the following pairs of ancestral populations: African vs. Chinese, African vs. European and Chinese vs. European.

Utilizing a plasmode generation approach, we developed a resampling procedure, which generates datasets where the null hypothesis of no association holds between each marker and the phenotypes of interest. A plasmode is similar to a simulated dataset; however, a plasmode dataset uses genotypic and phenotypic information observed in a study to construct pseudo-phenotypes so that the 'truth' of the data generating process is known [13, 14]. In this case, the relationship among the covariates remained intact while adding the permutated residual to the outcome variable ensured that all tests are conducted under the null hypothesis of no association between the genetic marker and the phenotype of interest. Therefore, we are sure that any observed significant association constitutes a type I error. Finally, we ran a series of simulation studies whose objectives are to better understand our findings and to determine whether they could be further generalized. These simulations are described in the simulation section.

In the remainder of the next section, we discuss the clustering method that was used to group MESA participants based on their observed genotypes, present the statistical approach retained for testing for association between each AIM and the 2 phenotypes of interest, and described the simulation procedures in detail.

### Genetic association tests

We examined all the available AIMs for association with both body surface-adjusted (BSA) LV mass, and LV ejection fraction using 3 different control variables: SRE, ancestry proportion estimates computed using STRUCTURE and genetic background measures computed using principal component analysis. That is, we implemented multivariate linear regression, including (1) SRE as a categorical variable, (2) the proportion of African, Chinese and European ancestry estimated using STRUCTURE, or the first 4 principal components computed using the AIMs data as covariates. We also included gender, income, education level, smoking history, alcohol use, systolic blood pressure, diastolic blood pressure, body mass index, and waist circumference as covariates in each model. Both analyses are conducted using generalized linear models. They also both fall under the structured association test (SAT) framework, which consists in testing for genetic association controlling for a genetic based measure of ancestral background [33]. The SAT approach is commonly used to correct for population stratification and admixture in genetic association tests. Recently published papers have shown that SAT approaches may fail to provide nominal type I errors for various reasons, including measurement error in the estimation of the confounding effect [34, 35] and cases where the estimated confounder captures an insufficient fraction of phenotypic variation [36]. However, we restricted our attention to older and better-known SAT approaches [33, 37–40]. As a preliminary analysis, we tested the null hypothesis of no association between the selected AIM and the phenotype of interest adjusting for the control variables mentioned above. Unlike in simulation-based work (see next section) the true association of the AIMs with LV mass is not known, and type I error rates for these tests could not be directly evaluated. However, marked differences between the statistical significance observed with each control variable may provide valuable insight regarding their performance. We devised a resampling procedure that guarantees that the methods are being compared under the null hypothesis. Details about this simulation procedure can be found in the next section. Finally, we ran two completely *in silico* analyses, which are also described in the next section, to mimic candidate gene association tests controlling for both SRE and a genetic based measure of ancestral background in order to gain a better understanding of the results observed in the initial test and the plasmode analysis.

### Description of the second set of simulation procedures

As can be seen in Tables 3 and 4, SRE appears to perform as well as the other GBMAs. The second set of simulations is designed to further elucidate why SRE, despite its much-publicized shortcomings of not being able to adequately control for confounding due to population stratification and admixture, seems to provide the correct type I error rate in this dataset.

First, we considered the case where the confounder is univariate. This could arise when the study sample comprised admixed individuals born from intermating between exactly two ancestral populations. We evaluated how the performance of SRE as a control variable depends on the distribution of ancestry proportions in the sample. Specifically, we wanted to see how the continuity and the size of the gap in a discontinuous ancestry proportion distribution would affect the performance of SRE. Note that a continuous distribution would have a gap of zero. The gap for a discontinuous distribution is defined as the range of the discontinuity region. For example, admixture is an ongoing process. A sample of admixed individuals can comprise individuals who are at different stage of the admixture process. Therefore, it is possible to recruit a sample that can be divided into 2 subsets of individuals: one with very high level of European ancestry and the other with very low level European ancestry. If the minimum European ancestry in the first subset is 0.8 and the maximum European ancestry in the second subset is 0.2 then gap value would simply be (0.8-0.2) = 0.6. We also looked at the effect of various misclassification rates in these association tests.

#### Simulation 2 (univariate ancestry proportion distribution (K = 2))

Let *N* = 2000 be the total number of individuals

For gap = 0.05 to 0.5 by 0.05{

Draw individual ancestry proportion *x* from
, where

Set true ethnicity to 1 if *x* ∈[0,*f*] and 2 otherwise.

Let *m* represent the misclassification rate.

For *m* = 0.025 to 0.15 by 0.025{

Draw the random variable *s* from a Bernoulli (*m*).

If (*s* = 0) then SRE = true ethnicity

else change the true ethnicity so that a misclassification occurs.

}

Compute
where *p*
_{
s
1
}and *p*
_{
s
2
}represent the allele frequency of the *s*
^{
th
} marker in each ancestral population respectively, **1**
_{
N
}is vector of ones, and
denotes the vector of allele frequencies in the admixed population for the *s*
^{
th
} marker. We consider 2 markers *G*
_{
1
} and *G*
_{
2
}. Draw *G*
_{
1
}from *Binomial* (2,
) and *G*
_{
2
}from *Binomial* (2,
). That is, we use the simulated ancestry proportion and the allele frequencies in the 2 ancestral populations to generate the genotypic probability for each individual under Hardy Weinberg equilibrium. Note that *G*
_{
1
}and *G*
_{
2
}are independent conditional on the individual ancestry. We will use *G*
_{
1
}to generate the trait and *G*
_{
2
}to test for genetic association.

Compute *y* = *α*
_{
0
}
*+ α*
_{1}
*G*
_{1}
*+ e* where *e ~ N* (**0**, *σ*
^{2}).

## Note that we set *α*
_{
1
}and *σ* such that the effect size is equal to 0.5 in Figure 3a and 1 in Figure 3b.

## Note that conditional on the individual ancestry, the random variable *Y* is also independent of *G*
_{
2
}.

Fit the following 3 linear regressions:

- 1.
*y* = *β*_{0} + *β*_{1}*x + β*_{1}*G*_{2} + *e*

- 2.
*y* = *β*_{0} + *β*_{1}*ethn + β*_{2}*G*_{2} + *e*

- 3.
*y* = *β*_{0} + *β*_{1}*sre + β*_{2}*G*_{2} + *e*

Test whether *β*
_{2} is statistically significant than 0 at the 0.05 level in each case.

## A statistically significant association observed between *Y* and *G*
_{
2
} will constitute a type I error.

}

Repeat the experiment 10,000 times for each configuration of the gap value and the misclassification rate and count the proportion of times that the parameter *β*
_{2} is statistically significant for each control variable. These proportions for the ancestry proportion and SRE regressions are shown in Figure 3. However, we do show the effect of various misclassification rates for 3 gap values (0.05, 0.3 and 0.5) in Table 5.

#### Simulation 3 (multivariate ancestry proportion distribution (K > 2))

We wanted to determine how SRE, when used as a control variable against population stratification, would perform in a multivariate setting. That is, when the number of number of ancestral populations is greater than 2. This simulation procedure resembles the previous one, except that the individual ancestry proportions are drawn from a Dirichlet distribution. We used a different set of parameters for each ethnic group in order to create the conditions needed for confounding to occur. The parameter used to generate the Dirichlet distribution can be represented by a 4 × 4 matrix, where the rows represent the expected individual ancestry proportions in each ethnic group, and for a fixed row, the columns represents the expected individual ancestry proportion from each ancestral population.

1) Let *N* = 2000 individuals divided equally into 4 subsets such that *n*
_{
k
} = 500 for *k* = 1,2,3,4 is the sample size in each ethnic group.

2) Let the SRE for an individual in the *k*
^{
th
} subgroup be *k*.

3) We considered 4 cases; the parameters used for the Dirichlet distribution in each case are chosen as follows:

- (a)
Proportions that are very close to ancestry proportion estimates obtained with STRUCTURE in the MESA sample;

- (b)
Values near a 4 × 4 identity matrix with diagonal elements equal to 0.97 and off diagonal elements equal to 0.01;

- (c)
Values very close to what would be observed in equally admixed individuals, that is all proportions are set 0.25;

- (d)
Different ancestry proportions where the contribution of one specific ancestral population is clearly greater in each admixed population. That is the diagonal elements are set to 0.55 and off diagonal elements at 0.15 and made sure that the row and column sums are equal to 1.

4) Let *P*
_{
1
}= (0.1, 0.5, 0.3, 0.9) and *P*
_{
2
}= (0.05, 0.25, 0.50, 0.75) be the frequency of the reference allele of the markers *G*
_{
1
}and *G*
_{
2
}respectively in each ancestral population. These frequencies were chosen arbitrarily with the only constraint being that they vary greatly between the 4 ancestral populations. Confounding will occur if the distribution of the trait is also different in the 4 ancestral populations.

5) The allele frequencies of *G*
_{
1
}and *G*
_{
2
}in the admixed population (
,
) are computed as the weighted averages of the allele frequencies given in (4), where the weights are the simulated ancestry proportions drawn for the Dirichlet distribution.

6) Draw *G*
_{
1
}and *G*
_{
2
}according to these averages.

7) The outcome variable *Y* is simulated from a normal distribution where the parameters are based on the distribution of the LV ejection fraction observed each ethnic group observed in the MESA study. That is, we used the mean in each group as the intercept in the model described in the next step. We then compute the pooled variance and set
such that each component has an effect size of 0.5, where *σ*
^{2} represents the pooled variance.

8) The remaining steps are similar to those taken in simulation 2. That is, we fitted the following linear regressions:

1. *y* = *β*
_{0} + *β*
_{1}
*x*
_{1} + *β*
_{2}
*x*
_{2} + *β*
_{3}
*x*
_{3} + *β*
_{4}
*G*
_{2} + *e*. The variables *x*
_{
1
}, *x*
_{
2
} and *x*
_{
3
} are the first 3 components of the individual ancestry proportion vector drawn from the Dirichlet distribution.

2. *y* = *α*
_{0} + *α*
_{1}
*SRE* + *α*
_{2}
*G*
_{2} + *e*. The variable *SRE* has 4 levels, which are defined in step 2. Therefore, *α*
_{
1
} is a vector with 4 components.

We then tested whether *β*
_{4} and *α*
_{2} are statistically different than 0 at the 0.05 level in each model, and repeated each experiment 10,000 times. The results of this simulation procedure can be seen in Figure 4. As can be seen in this figure, when the sample contained admixed individuals with more than 2 ancestral populations, SRE performed rather well as a control variable. This suggests that it is not the variations in the ancestry proportions themselves that cause the type I error inflation.

To better understand when the use of SRE as a control variable may fail, we devised a situation where it may be unclear which ethnicity to assign to individuals whose ancestry proportions take specific values. To facilitate the graphical representation of each scenario, we focused on the case where there are exactly 3 ancestral populations. In this case, an individual's ancestry proportion can be represented by a vector with 3 components, *adx*
_{1}, *adx*
_{2}
*and adx*
_{3} such that
. Therefore, without loss of generality, we can restrict our attention to a bivariate distribution by focusing only on the first two components of this vector. As can be seen in Figure 5, the choice of gap values on both the x-axis and y-axis and in addition to the constraint that *adx*
_{
1
}+ *adx*
_{
2
}≤ **1**, define 3 or 4 specific regions. Figure 5a shows 4 valid regions, and if one decides to assign ethnicity according to the maximum value of the vector (*adx*
_{
1
}, *adx*
_{
2
}, *adx*
_{
3
}), it is not exactly clear what the correct ethnicity assignment should be for the individuals whose ancestry proportions fall in region IV. There is no such ambiguity in Figure 5b.

The simulation steps are similar to those described in simulation 2, except that there are now two gap values: one for *adx*
_{
1
}and one for *adx*
_{
2
}. The ancestry proportions are each drawn independently from a uniform distribution, which has been rescaled such that the proportions add up to 1. We then excluded the ancestry proportions that fell in the regions defined by the gaps. In Figure 5a, we excluded values of *adx*
_{
1
}and *adx*
_{
2
}that fall between 0.1 and 0.3. The range of excluded values went from 0.1 to 0.5 in 5b. As in all previous cases, we considered 2 markers G1 and G2. We used G1 to simulate the trait, and G2 to test for association with the simulated trait. All significant association is seen as type I error. We use the vector (0.4, 0.2, 0.6) as the allele frequency in the 3 ancestral populations for G1 and (0.2, 0.4, 0.6) for G2. We also considered various effect sizes for evaluating the contribution of admixture in the confounding pathway. We also changed the allele frequencies vector to account for the fact that *k*, the number of ancestral populations, is now 3 instead of 4. The remaining simulation steps are again similar to those described in simulation 3. The results of this simulation analysis can be seen on Figure 6, where the vector (*a*,*b*) represents the coefficient associated with the variables *adx*
_{
1
}and *adx*
_{
2
}in the model. The error term in all models is drawn from a normal distribution with 0 and variance 1 such that the effect size associated with *adx*
_{
1
}and *adx*
_{
2
}are equal to *a* and *b* respectively.

#### Simulation 4 (effect of misclassification error on SRE when K = 4)

As can be seen in Figure 7 when the number of ancestral population is greater than 2, it is misclassification errors, as opposed to the actual distribution of the ancestry proportions that dictate the performance of SRE as a control variable. We ran a final simulation study to evaluate the effect of misclassification errors on a multi-ethnic sample like the MESA. The simulations steps are very similar to those taken in simulation 3.

1) Let *n*
_{
k
} = 500 for *k* = 1,2,3,4 be the sample size in each ethnic group.

2) Let the true ethnicity of any individual in the *k*
^{
th
} subgroup be *k*.

3) Draw *x*
_{
k
} from *Dir*(*α*
_{
k
}) where *α*
_{
k
} is based on the ancestry proportions observed in the MESA study. These proportions are displayed in Figure 2.

That is, (**0.09, 0.02, 0.84, 0.05**) for the African-Americans, (**0.89, 0.02, 0.02, 0.07**) for the European-Americans, (**0.24, 0.05, 0.14, 0.57**) for the Hispanic-Americans and (**0.01, 0.97, 0.01, 0.01**) for the Chinese-Americans.

To evaluate the effect of misclassification errors on the performance of SRE as a control variable, the true ethnicity of a fraction *m* of individuals in each subgroup is changed to create misclassification in the SRE variable. This fraction is assigned to one of the 3 remaining subgroups uniformly. This is done for each subgroup separately.

4) We let *m* vary from 0.05 to 0.15 by 0.025.

5) The remaining simulations steps are similar to those taken in simulation 3, and are not repeated here.