For association mapping to be possible, LD must be present in the collection of individuals under study, with the levels of LD varying according to species and locus investigated. Previous studies in maize and the predominantly self-pollinating model species Arabidopsis and rice have demonstrated strong levels of LD surrounding genes controlling flowering time and disease resistance, extending from tens of kb up to 1 cM [14, 23, 25, 27, 29–32]. In the temperate crop barley, LD has been identified over a ~200 kb region encompassing the Ha locus controlling grain hardness , and up to a distance of 5.5 cM from a gene conferring resistance to the barley yellow mosaic virus (BMYV) complex . Here we estimate LD to extend at least 0.7 cM from VRN-H1 and 6.4 cM from VRN-H2. This is within a similar range of LD previously identified surrounding selected traits in barley, and helps to indicate the scale of marker saturation required to permit future association mapping of loci of interest in barley. High levels of LD were observed within VRN-H1 over distances of up to 16.7 kb (winter VRN-H1 haplotype 1A), suggesting LD decay within VRN-H1 may differ between European germplasm, compared to the low levels suggested to occur in North American germplasm .
As an initial inspection of population substructure, PCoA suggested primary division within the sample investigated is due to GH. Division within GH pools due to row-number was more distinct within the spring germplasm. Furthermore, notable sub-clustering was observed for varieties belonging to the spring VRN-H1 haplotype 1B , found predominantly in Scandinavian varieties. PCoA clearly showed that sub-structure existed within the varietal population and would therefore need to be adequately accounted for during association analysis. It is generally acknowledged that establishing a value for K (the number of sub-populations) prior to association analysis, is not a trivial exercise [35, 36]. To identify the appropriate value of K for use in this study, we attempted to find a maximum for Ln(P|D) (a measure of the maximum likelihood of the specific sub-population model as a description of these data) and also to minimise AIC (which finds the value of K at which the phenotypic variation is best accounted for by the sub-population membership matrix). Neither approach gave an unambiguous answer but instability in determining the Q matrix suggested K > 6 was unsafe, while AIC suggested most of the phenotypic variation was described at K = 4. Varietal membership to the four groups does not correspond to the expected winter/spring, 6-row/2-row combinations, although 96 % of sub-population 1 are 2-row spring, and sub-population 3 contains 87 % of all winter varieties (data not shown). SA with K = 4 appears to have eliminated a high proportion of false positive results: out of 61 markers tested 40 were significant in uncorrected data at p = 0.05 (Bonferroni corrected), falling to 14 after SA was applied (Figure 6). However, in many situations, prior knowledge as we have described above will not be available. In these circumstances we believe that exploration of Ln(P|D) and AIC values could give useful guidance.
Association mapping by logistic regression without correction for structure or for inflation of λ due to multiple testing showed significant association across chromosomes 4H and 5H, while 77 % of marker/trait associations using the unmapped S-SAP markers had an observed p value ≤ 0.05 (Figure 5 and 6A). Correction using structured association alone (K = 4) resulted in a large reduction in associated markers, presumably due to a selective reduction of spurious associations, reflected in the reduction from 77 % to 23 % of markers showing association with p ≤ 0.05 (Figures 5 and 6B). The addition of genomic control to SA analysis resulted in the elimination of an additional five markers on chromosomes 4H and 5H; applied to the genome wide marker set, the proportion of markers with p ≤ 0.05 was reduced to 9 %. Interestingly, GC alone reduces the number of markers significant at p ≤ 0.05 slightly more (7 %) than when used in conjunction with SA (9 %). We attribute this to the discriminating nature of the SA correction: GC alone (based on a robust mean of λ = 24.7) reduces the test statistic for all markers by an equal proportion; in contrast SA selectively reduces λ at each marker using a correction specifically tailored to the sub-population fractional membership of each individual in the experimental population. SA reduced background association such that the robust mean of λ falls to 1.8. Subsequent GC correction is therefore much less stringent because the general inflation of the test statistic has already been largely removed by SA. In summary, we believe the approach undertaken gives a higher objective threshold for significance and therefore reduces the number of potentially associated markers for consideration. Despite GH representing one of the major divisions within barley germplasm, association mapping using the combined SA and GC approach identified markers within VRN-H1 and candidate VRN-H2 genes as significantly associated with GH after applying statistical correction for the population structure they largely define. Of all the VRN-H1 markers investigated here, the VRN-H1-intronI-St assay (which tests for the absence of large intron I deletions within a 'vernalization critical' region associated with the recessive winter vrn-H1 allele) was previously found to show strong correlation with GH [7–9]. SA found this assay to show the highest association with GH (Figure 6B), demonstrating the statistical approaches undertaken here agree with previous studies aimed at determining functional polymorphism at VRN-H1. Five additional VRN-H1 markers showed highly significant (but lower) p values, although since they are not within the 'vernalization critical' region of VRN-H1, their significance is most likely due to the strong LD identified within the gene.
The identification of markers strongly associated with GH, 0.7 cM distal (HvCSFs1) and 2 cM proximal (HvPHYC)  to VRN-H1 illustrates this locus could potentially have been identified by a genome-wide scan even in the absence of the candidate genes assayed here, given a density of approximately one marker every 1 cM. This level of marker saturation is yet to be routinely achieved in large genome crops, although coverage approaching this density appears feasible in barley . Such practical limitations for the use of association genetics in plants suggests it is better suited to fine-mapping, after localization of the trait of interest by QTL mapping . This approach would have been applicable here, even with no prior knowledge of VRN-H1 and candidate VRN-H2 genes. Indeed, the association analysis carried out in this study provided a resolution capable of differentiating between intra-genic VRN-H1 markers. Previous association mapping studies have failed to identify VRN-H1, despite an average marker density of ~84 SNPs per chromosome ; instead, a marker 27 cM proximal to VRN-H1 was highly associated with growth habit. Furthermore, the use of a G/C SNP in the 3' UTR of VRN-H1 in an association study based on a collection of 102 North American barley varieties analysed in conjunction with ~1,100 genome-wide SNPs, showed no correlation with growth habit . The failure to detect VRN-H1 and VRN-H2 in previous association mapping studies may have been due to factors such as sample size, partitioning according to population structure and phenotypic errors. However, even if closely linked markers are available, the power to detect association also depends on marker and trait allele frequencies [37, 38]. For example, if the minor allele frequency is very low, or present in a subset of lines not associated with a QTL, the marker is less likely to detect association. This is evident here, where despite the high associations between the majority of VRN-H1 markers and growth habit, three failed to detect significant association.
A variety of approaches have been successfully used to detect marker-trait associations in cereals. Thornsberry et al  pioneered the use of Bayesian modelling of population structure (using STRUCTURE) in conjunction with logistic regression in crop plants in an association study of candidate gene polymorphisms with flowering time variation in maize. In addition to logistic regression, the Buckler group have implemented the General Linear Model (GLM) and Mixed Linear Model (MLM) approaches using TASSEL . GLM has been used by Ravel et al ; MLM, which incorporates a model of kinship between varieties with complex patterns of shared ancestry, is suitable for applications where larger numbers (some hundreds) of markers are available . The MLM approach with SA was used recently to look for associations with seasonal growth habit in barley . Breseghello et al  used STRUCTURE to detect population structure and MLM to detect marker trait associations in hexaploid wheat. Kraakman et al , looking for association with yield and yield stability in spring barley, found no population structure using a Bayesian modelling approach, and instead calculated Pearson correlation coefficients and applied a False Discovery Rate to correct for multiple testing. Although the use of MLM was not possible here (due to the limited marker numbers available), the basic logistic regression with SA reported here is analogous to the logistic regression analysis module available in TASSEL. Our implementation of genomic control follows standards set in human studies . It is, to our best knowledge, the first example of its use to control for residual confounding after adjustment by other means. A similar approach is suggested by Pritchard et al  in their description of SA, and the use of GC in this manner was made explicit by Price et al  in a supplementary note to their description of a principal components analysis to correct for stratification.