Choice of population structure informative principal components for adjustment in a casecontrol study
 Gina M Peloso^{1} and
 Kathryn L Lunetta^{1}Email author
DOI: 10.1186/147121561264
© Peloso and Lunetta; licensee BioMed Central Ltd. 2011
Received: 24 March 2011
Accepted: 19 July 2011
Published: 19 July 2011
Abstract
Background
There are many ways to perform adjustment for population structure. It remains unclear what the optimal approach is and whether the optimal approach varies by the type of samples and substructure present. The simplest and most straightforward approach is to adjust for the continuous principal components (PCs) that capture ancestry. Through simulation, we explored the issue of which ancestry informative PCs should be adjusted for in an association model to control for the confounding nature of population structure while maintaining maximum power. A thorough examination of selecting PCs for adjustment in a casecontrol study across the possible structure scenarios that could occur in a genomewide association study has not been previously reported.
Results
We found that when the SNP and phenotype frequencies do not vary over the subpopulations, all methods of selection provided similar power and appropriate Type I error for association. When the SNP is not structured and the phenotype has large structure, then selection methods that do not select PCs for inclusion as covariates generally provide the most power. When there is a structured SNP and a nonstructured phenotype, selection methods that include PCs in the model have greater power. When both the SNP and the phenotype are structured, all methods of selection have similar power.
Conclusions
Standard practice is to include a fixed number of PCs in genomewide association studies. Based on our findings, we conclude that if power is not a concern, then selecting the same set of top PCs for adjustment for all SNPs in logistic regression is a strategy that achieves appropriate Type I error. However, standard practice is not optimal in all scenarios and to optimize power for structured SNPs in the presence of unstructured phenotypes, PCs that are associated with the tested SNP should be included in the logistic model.
Background
The principal components (PCs) of genomewide genotype data can be used to detect and adjust for population structure in genetic association analyses [1, 2]. The popularity of the PC method is evident by its wide use: it has been cited by over 400 publications. However, the choice of which PCs to use and the best way to adjust for the PCs in analyses of dichotomous traits is not yet clear.
Methods of ancestry informative PC selection.
Scenarios of population structure that could occur across the genome.
Structured Phenotype (K_{1}≠ K_{2})  NonStructured Phenotype (K_{1} = K_{2})  

Structured Genotype (p _{ 1 } ≠ p _{ 2 } )  A  B 
NonStructured Genotype (p _{ 1 } = p _{ 2 } )  C  D 
 (1)
No PC adjustment (None)
 (2)
10% Rule (10% Rule)
 (3)
PCs significantly related to the outcome at significance level α = 0.001, 0.01, or 0.05 (Sig001, Sig01, Sig 05, respectively)
 (4)
PCs significantly related to the SNP at α = 0.05 or 0.01 (SNP01, SNP05, respectively)
 (5)
PCs significant according to the TracyWidom statistic at α = 0.05 (TW)
 (6)
Top PCs (2 or 10) determined according to eigenvalue (Top2, Top10, respectively)
 (7)
Simulated true population i.e, Gold standard (Pop)
We tested for association with the simulated case/control outcome using logistic regression and compared Type I error and power of associations between the outcome and SNP when adjusting for selected principal components of ancestry. Finally, to provide a practical example of the methods of PC selection, we performed all methods of PC selection using dichotomized height data from the Framingham Heart Study.
Methods
We simulated independent genomewide SNPs by generating ancestral population allele frequencies for 10,000 SNPs (p_{j}, j = 1,...,10000) from a uniform (0.05, 0.50)distribution. We then created two subpopulations (i = 1,2) of 500 individuals, each descending from the ancestral population according to F_{st}. We simulated the allele frequencies p_{ij} (i = 1,2; j = 1,..., 10000) in the two subpopulations according to a beta distribution [5]:
F_{st} is a measure of population differentiation [5]; F_{st} = 0.01 is representative of human population structure seen within continents, while F_{st} = 0.1 is representative of structure seen between continents [6, 7]. We simulated our samples using F_{st} values of 0.01 and 0.1. 1,000 replicates of independent genomewide SNP data were generated for 2 populations of 500 individuals.
Simulation parameters.
Description  Possible Values 

Population differentiation (F_{st})  0.01, 0.1 
Population prevalence of disease (K)  0.10 
Frequency of disease in sub population 1 (K_{1})  0.10  0.19 
Number of cases in subpopulation 1  250  475 
Overall risk allele frequency (p)  0.20 
Risk allele freq in subpopulation 1 (p_{1})  0.10  0.30 
Odds Ratio (log additive model)  1.0, 1.2, 1.5 
Relationship between number of cases and population prevalence of disease in each subpopulation.
# of Cases in Population 1  # of Cases in Population 2  Fold increase in # of cases  K_{1}  K_{2} 

250  250  1  0.10  0.10 
300  200  1.5  0.12  0.08 
375  125  3  0.15  0.05 
400  100  4  0.16  0.04 
475  25  19  0.19  0.01 
We compared the methods for selecting PCs for adjustment described above. We used logistic regression to test the association between the test SNP and case status adjusting for latent ancestry defined by the PCs. We compared the proportion of replicates significant at α = 0.05, using 1,000 replicates for each set of parameters to investigate Type I error and power. 1,000 replicates for Type I error provides a 95% confidence interval of 0.036 to 0.064 around a nominal significance level of 0.05.
To determine the effects of the PC selection methods when a sample is composed of a more complex structure, we simulated two populations that each diverged with as F_{st} of 0.01 from an ancestral population, as previously. We treated these two subpopulations as ancestral populations and then simulated two subpopulations diverging from each of the ancestral populations, again with an F_{st} of 0.01. The resulting sample had four subpopulations. Due to computational limitations, a single replicate of independent genomewide SNP data was generated for this scenario for PCA. As before, 1,000 replicates were used to evaluate Type I error and power, simulating the genotype and phenotype (conditional on genotype for power) for each replicate. We varied the phenotypic and genotypic structure of the subpopulations, from having no structure to more extreme structure.
Results and Discussion
Simulation Study
We next expanded this simulation to larger differences in allele frequencies between the two subpopulations. With large F_{st} (0.10), we expect many SNPs with greater than a 0.2 difference in allele frequency between subpopulations, and thus we believe it may be more common to observe large allele frequency differences between populations than large phenotypic differences between populations. We found that as the difference in the risk allele frequency between the two subpopulations increases, the difference in power between adjusting and not adjusting for PCs becomes greater (see additional file 1) when a nonstructured phenotype and structured SNP were tested for association. Consistent with our previous results, we found when both the phenotype and the SNP are structured any method of selection was adequate. When the SNP is nonstructured (p_{1} = p_{2} = 0.5) and the phenotype has large structure (phenotypic ratio = 4), we found slightly higher power when selecting PCs by the 10% rule compared to not selected any PCs, contrary to the findings presented in Figure 2. As seen in additional file 1, the difference in Type I error between using the 10% rule method versus no PC selection method is similar to the difference in power. The difference between the two analyses is that PC1 is included in 16% of the replicates for the 10% rule.
Finally, we increased the sample size for the genomewide SNPs simulation to 5,000 individuals from each subpopulation. Increasing the sample size allowed us to determine if our observed results were affected by the simulated sample size of 500 cases and 500 controls. We found the same patterns with the larger sample size as we did when we used the 500 individuals from each subpopulation (results not shown).
In general, we observed similar patterns when the data consisted of four subpopulations as with the two subpopulations scenario already presented (see additional file 2). When the SNP and phenotype frequencies do not vary over the subpopulations, all methods of selection provided similar power and appropriate Type I error for association. When the SNP is not structured and the phenotype has large structure, then selection methods that do not select PCs for inclusion as covariates provide the most power. Likewise, when there is a structured SNP and a nonstructured phenotype, selection methods that include PCs in the model have greater power. When both the SNP and the phenotype are structured, all methods of selection have similar power, except when the SNP differs between the 4 subpopulations and only the top 2 PCs are selected for adjustment. In this case, we observe elevated Type I error and loss of power because three PCs are required to distinguish 4 subpopulations, and only 2 PCs are included in the model. The loss of power when we do not adjust fully for the population structure is due to an attenuation of the effect due to the population structure (negative confounding). Positive confounding occurs when the structure is in the same direction as the true genetic effect, or that the phenotypic means and risk allele frequencies are positively correlated. Negative confounding occurs when the structure is in the opposite direction as the true genetic effect, or in other words, that the phenotypic means and risk allele frequencies are negatively correlated [8]. While we only report results here based on negative confounding, simulations with positive confounding yielded similar conclusions (see additional file 3).
Scenarios that could occur across the genome with the optimal method of selection.
Structured Phenotype (K_{1} ≠ K_{2})  NonStructured Phenotype (K_{1} = K_{2})  

Structured Genotype (p _{ 1 } ≠ p _{ 2 } )  Any method of selection except no PC adjustment  Selecting a fixed number of PCs or PCs associated with the SNP 
NonStructured Genotype (p _{ 1 } = p _{ 2 } )  Selecting PCs associated with the SNP (α = 0.01), 10% Rule, no PC adjustment  Any method of selection 
Example of Principal Component Selection Criteria with Height
Average adult height is taller in northern Europe than in southern Europe. By our definition, height is a structured phenotype, i.e., it varies by ancestry. Lactose intolerance also varies across Europe from North to South. The genetic polymorphism in the LCT (Lactase) gene that causes lactose intolerance, and the SNPs in LD with this polymorphism, appears to be associated with height in nonhomogeneous samples of individuals of European descent [10]. Observing an LCTheight association in a sample indicates the sample has population structure. As an example of using the various methods of PC selection in practice, we investigated the association between height and four SNPs in the Framingham Heart Study, adjusting for selected PCs. Because our interest is in dichotomous outcomes, we dichotomized height by the median for this example, and used logistic regression to test for association between four SNPs and dichotomized height:

rs1042725 and rs6060369 [11]: Two positive control SNP not in the LCT gene that are known to be associated with height. Both SNPs are associated with PC1 with pvalues of 9.67E08 and 0.0003, respectively.

rs2322659 [10]: A structured SNP in the lactase gene which is known to vary in frequency among European Americans. This SNP is highly associated with PC1 (pvalue = 3.8E73).

rs2290305: a nonstructured SNP, not associated with PC1 (pvalue = 0.425).
Height association results with methods for selecting PCs.
beta (pvalue) of SNP  

sSNP*  nSNP**  Positive Control SNPs  
Method for selecting PCs  Selected PCs  rs2322659  rs2290305  rs1042725  rs6060369 
No PC Adjustment  NA  0.406 (< 0.001)  0.086 (0.338)  0.251 (0.002)  0.221 (0.007) 
Top 2 PCs  PC1, PC2  0.043 (0.646)  0.064 (0.491)  0.149 (0.073)  0.336 (< 0.001) 
Top 10 PCs  PC1  PC10  0.057 (0.563)  0.054 (0.571)  0.132 (0.121)  0.322 (< 0.001) 
TracyWidom statistic  PC1  PC81  0.062 (0.566)  0.057 (0.581)  0.174 (0.061)  0.318 (0.001) 
Associated with the outcome at α = 0.05  PC1, PC2, PC4, PC8, PC21, PC25, PC28, PC39, PC47, PC49, PC56, PC64, PC77  0.03 (0.762)  0.049 (0.617)  0.164 (0.059)  0.313 (0.001) 
Associated with the outcome at α = 0.01  PC1, PC4, PC21, PC25, PC28, PC49, PC77  0.015 (0.875)  0.053 (0.578)  0.162 (0.059)  0.322 (< 0.001) 
Associated with the outcome at α = 0.001  PC1, PC4, PC28  0.014 (0.884)  0.047 (0.619)  0.163 (0.055)  0.322 (< 0.001) 
Associated with the SNP at α = 0.05  varied by SNP  0.041 (0.677)  0.092 (0.312)  0.148 (0.075)  0.333 (< 0.001) 
Associated with the SNP at α = 0.01  varied by SNP  .038 (0.703)  0.086 (0.338)  0.164 (0.170)  0.313 (< 0.001) 
10% Rule  PC1#  0.037 (0.692)  0.086 (0.338)  0.148 (0.075)  0.333 (< 0.001) 
PCFinder [21]  NA  0.406 (< 0.001)  0.086 (0.338)  0.251 (0.002)  0.221 (0.007) 
Conclusions
We performed a simulation study in which we generated multiple sets of genomewide SNPs. The goal was to investigate Type I error and power of associations between casecontrol status and a SNP when adjusting for ancestry informative PCs selected by a variety of rules. A second aim of this study was to examine more critically the effects of the amount of phenotypic structure and genotypic structure on the association analysis, as well as investigate the bias and precision of the associations.
We did not specifically address the issue of which SNPs to include in the PCA. Using all available SNPs in a PCA provides the maximal information to ancestry, but highly correlated SNPs or unusual chromosomal phenomena such as known inversion polymorphisms or genomic regions known to play a role in susceptibility to a disease can affect the results from a PCA [13]. Under some conditions, including the chromosomal regions with high influence in PCA may have a negative impact on power when PCs are used to adjust for ancestry. For example, if the region harbours a true genetic effect, the effect may be adjusted away. Thus, some researchers have recommended not using PCs that are correlated with localized chromosomal regions [14].
All simulations were performed using distinct subpopulations. Admixed individuals are commonly used in GWAS. While we did not explicitly simulate admixed individuals, we know based on previous work [15] that PCA to detect ancestry and subsequent adjustment works similarly with admixed individuals having global phenotypic structure. On the other hand, PCs based on genomewide data do not adequately capture local ancestry or local phenotypic structure [16, 17]. If local phenotypic structure exists, other techniques need to be applied to capture and adjust for local ancestry such as PCA in the region of the test SNP, or methods that estimate local ancestry proportions such as ANCESTRYMAP [18] or LAMP [19]. Further work needs to be done to determine how to adjust for local ancestry without adjusting away true effects.
We focused our exploration on linear PC adjustment models. We did not investigate adjusting for clusters identified in the individual genotype data because previous work has suggested that linear adjustments are adequate for the population structure typical of European populations [20]. We did not investigate the method of testing the reduction in the inflation of the genomic control lambda [4] or PCFinder [21] due to the computational burden of these algorithms. Prior to excluding these methods, we investigated the run time for the PCFinder algorithm and found that it is related to the number of PCs selected. When more PCs are needed to adjust for the structure, the algorithm takes longer to run. With simulated genotypes similar to those presented in this chapter, PCFinder requires between 5 minutes and one hour to select PCs. While these methods may be feasible in individual data sets where the algorithm needs to be run only once for the outcome of interest, it is not suited for simulations requiring thousands of replicates. Furthermore, we found in our dichotomized height example that PCFinder did not select any PCs for adjustment and therefore did not remove the false association with the LCT SNP.
Our findings suggest that to optimize power under certain scenarios, the choice of covariate PCs in a genomewide association study using logistic regression with a dichotomous outcome should be SNPdependent. Our findings only apply to casecontrol or dichotomous outcome analyses using logistic regression. These results may appear to conflict with Xing and Xing [22], who recently clarified that covariate adjustment in logistic regression always leads to a loss of precision, but not always a loss of power. They conclude that when the genotype and covariates are independent, it is still more efficient to adjust for the predictive covariates. In contrast, we found that with large phenotypic structure and nonstructured genotype, it is not more efficient to adjust for ancestry informative PCs. This is due to the very large (odds ratio > 10) association between the population and outcome and the exposure with a frequency of 50% (i.e., half the sample is in population 1 and the other is in population 2). When we extended Xing and Xing's simulations to larger exposure odds ratios with the exposure frequency of 50%, we obtained results consistent with our findings above. Whether to adjust for covariates depends on the complex relationships between the outcome, the covariate, and the genotype. When the phenotype does not have substantial structure, we obtain similar power when adjusting for population structure as when not adjusting for population structure. In their investigation, Xing and Xing limited to the situation where the covariate and genotype are independent. In our work, the SNP of interest and the ancestry informative PCs may not be independent. The SNP genotype is included in the linear combination of the genotypes that define the PCs; structured SNPs contribute a higher weight to some PCs. We found that with structured SNPs and nonstructured phenotypes, it is more efficient to adjust for PCs.
For linear regression using continuous phenotypes, one can check phenotypes for association with the PCs. If a top PC is significantly associated with the phenotype of interest then the traitgenotype association model should include PCs as covariates to adjust for population structure. Unlike logistic regression, adjusting for covariates associated with the trait in linear regression always improves the precision of the effect estimate by reducing the residual variance [9]. Since the PCs are orthogonal, a single model regressing the top PCs on the outcome can be used to determine if the PCs are associated with the outcome. Associated PCs should be included as covariates in genomewide association studies (GWAS).
Standard practice is to include a fixed number of PCs in association models for GWAS. Here, we conclude that if power is not a concern, then selecting the same set of PCs for adjustment for all SNPs in logistic regression is a strategy that achieves appropriate Type I error. However, standard practice is not optimal in all scenarios and to optimize power for structured SNPs in the presence of unstructured phenotypes, PCs that are associated with the tested SNP should be included in the logistic model. The gain in power we observed in our simulations was an approximate 5%percentage point increase for adjusting only when the SNP is structured over always adjusting for the ancestry informative PCs. We note that some of the differences in power may disappear if we correct for Type I error, but this is not done in practice. It may be easier and more intuitive to adjusting for the same set of PCs across all SNP associations.
Declarations
Acknowledgements
A portion of this research was conducted using the Linux Clusters for Genetic Analysis (LinGA) computing resource funded by the Robert Dawson Evans Endowment of the Department of Medicine at Boston University School of Medicine and Boston Medical Center and contributions from individual investigators.
Authors’ Affiliations
References
 Patterson N, Price AL, Reich D: Population structure and eigenanalysis. PLoS Genet. 2006, 2 (12): e19010.1371/journal.pgen.0020190.PubMed CentralView ArticlePubMedGoogle Scholar
 Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D: Principal components analysis corrects for stratification in genomewide association studies. Nat Genet. 2006, 38 (8): 904909. 10.1038/ng1847.View ArticlePubMedGoogle Scholar
 Kimmel G, Jordan MI, Halperin E, Shamir R, Karp RM: A randomization test for controlling population stratification in wholegenome association studies. Am J Hum Genet. 2007, 81 (5): 895905. 10.1086/521372.PubMed CentralView ArticlePubMedGoogle Scholar
 Yu K, Wang Z, Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G: Population substructure and control selection in genomewide association studies. PLoS ONE. 2008, 3 (7): e255110.1371/journal.pone.0002551.PubMed CentralView ArticlePubMedGoogle Scholar
 Balding DJ, Nichols RA: DNA profile match probability calculation: how to allow for population stratification, relatedness, database selection and single bands. Forensic Sci Int. 1994, 64 (23): 125140. 10.1016/03790738(94)902224.View ArticlePubMedGoogle Scholar
 Marchini J, Cardon LR, Phillips MS, Donnelly P: The effects of human population structure on large genetic association studies. Nat Genet. 2004, 36 (5): 512517. 10.1038/ng1337.View ArticlePubMedGoogle Scholar
 Price AL, Butler J, Patterson N, Capelli C, Pascali VL, Scarnicci F, RuizLinares A, Groop L, Saetta AA, Korkolopoulou P: Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 2008, 4 (1): e23610.1371/journal.pgen.0030236.PubMed CentralView ArticlePubMedGoogle Scholar
 Ding X, Weiss S, Raby B, Lange C, Laird NM: Impact of population stratification on familybased association tests with longitudinal measurements. Stat Appl Genet Mol Biol. 2009, 8 (1): Article 17PubMedGoogle Scholar
 Robinson L, Jewell NP: Some Surprising Results about Covariate Adjustment in Logistic Regression Models. International Statistical Review. 1991, 59 (2): 227240. 10.2307/1403444.View ArticleGoogle Scholar
 Campbell CD, Ogburn EL, Lunetta KL, Lyon HN, Freedman ML, Groop LC, Altshuler D, Ardlie KG, Hirschhorn JN: Demonstrating stratification in a European American population. Nat Genet. 2005, 37 (8): 868872. 10.1038/ng1607.View ArticlePubMedGoogle Scholar
 Lettre G, Jackson AU, Gieger C, Schumacher FR, Berndt SI, Sanna S, Eyheramendy S, Voight BF, Butler JL, Guiducci C: Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet. 2008, 40 (5): 584591. 10.1038/ng.125.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhang L, Li J, Pei YF, Liu Y, Deng HW: Tests of association for quantitative traits in nuclear families using principal components to correct for population stratification. Ann Hum Genet. 2009, 73 (Pt 6): 601613.PubMed CentralView ArticlePubMedGoogle Scholar
 Zou F, Lee S, Knowles MR, Wright FA: Quantification of population structure using correlated SNPs by shrinkage principal components. Hum Hered. 2010, 70 (1): 922. 10.1159/000288706.PubMed CentralView ArticlePubMedGoogle Scholar
 Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ, Bhangale T, Boehm F, Caporaso NE, Cornelis MC, Edenberg HJ: Quality control and quality assurance in genotypic data for genomewide association studies. Genet Epidemiol. 2010, 34 (6): 591602. 10.1002/gepi.20516.PubMed CentralView ArticlePubMedGoogle Scholar
 Zhu X, Li S, Cooper RS, Elston RC: A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet. 2008, 82 (2): 352365. 10.1016/j.ajhg.2007.10.009.PubMed CentralView ArticlePubMedGoogle Scholar
 Qin H, Morris N, Kang SJ, Li M, Tayo B, Lyon H, Hirschhorn J, Cooper RS, Zhu X: Interrogating local population structure for fine mapping in genomewide association studies. Bioinformatics. 2010, 26 (23): 29612968. 10.1093/bioinformatics/btq560.PubMed CentralView ArticlePubMedGoogle Scholar
 Wang X, Zhu X, Qin H, Cooper RS, Ewens WJ, Li C, Li M: Adjustment for local ancestry in genetic association analysis of admixed populations. Bioinformatics. 2011, 27 (5): 670677. 10.1093/bioinformatics/btq709.PubMed CentralView ArticlePubMedGoogle Scholar
 Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, Hauser SL, Smith MW, O'Brien SJ, Altshuler D: Methods for highdensity admixture mapping of disease genes. Am J Hum Genet. 2004, 74 (5): 9791000. 10.1086/420871.PubMed CentralView ArticlePubMedGoogle Scholar
 Sankararaman S, Sridhar S, Kimmel G, Halperin E: Estimating local ancestry in admixed populations. Am J Hum Genet. 2008, 82 (2): 290303. 10.1016/j.ajhg.2007.09.022.PubMed CentralView ArticlePubMedGoogle Scholar
 Peloso GM, Timofeev N, Lunetta KL: Principalcomponentbased population structure adjustment in the North American Rheumatoid Arthritis Consortium data: impact of singlenucleotide polymorphism set and analysis method. BMC Proc. 2009, 3 (Suppl 7): S10810.1186/175365613s7s108.PubMed CentralView ArticlePubMedGoogle Scholar
 Li Q, Wacholder S, Hunter DJ, Hoover RN, Chanock S, Thomas G, Yu K: Genetic background comparison using distancebased regression, with applications in population stratification evaluation and adjustment. Genet Epidemiol. 2009, 33 (5): 432441. 10.1002/gepi.20396.PubMed CentralView ArticlePubMedGoogle Scholar
 Xing G, Xing C: Adjusting for covariates in logistic regression models. Genet Epidemiol. 2010, 34 (7): 769771. 10.1002/gepi.20526. author reply 772PubMed CentralView ArticlePubMedGoogle Scholar
 Novembre J, Stephens M: Interpreting principal component analyses of spatial population genetic variation. Nat Genet. 2008, 40 (5): 646649. 10.1038/ng.139.PubMed CentralView ArticlePubMedGoogle Scholar
 Jewell NP: Statistics for Epidemiology. 2004, Chapman & Hall/CRC, I:Google Scholar
 Li Q, Yu K: Improved correction for population stratification in genomewide association studies by identifying hidden population structures. Genet Epidemiol. 2008, 32 (3): 215226. 10.1002/gepi.20296.View ArticlePubMedGoogle Scholar
Copyright
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.