From: An application of Random Forests to a genome-wide association dataset: Methodological considerations & new findings

Analysis Flow. Flow Plan for RF analysis. The full MS case-control dataset was analyzed, searching for the optimal mtry &ntree, along with sparsity pruning, as necessary. Two runs were then conducted, one without any 6p genotypes, and one with data for a single 6p SNP. Finally, LD pruning was explored. After the best data configuration was found, RF analysis was re-run to examine stability of results. The final RF results were compared to the original GWA results [19].

